Hi Dumitru,

if eventually this patch got merged, please remove next lines from its commit 
message:

"Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
be done in the next patch."

Unfortunately I was unsuccessful in removing these probing direction (default 
60s).
Maybe you can give any hint how to solve this [1].

1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html

> On 11 Sep 2023, at 15:57, Dumitru Ceara <dce...@redhat.com> wrote:
> 
> On 9/11/23 14:54, Dumitru Ceara wrote:
>> On 6/14/23 20:02, Vladislav Odintsov wrote:
>>> Hi Mark,
>>> 
>>> thanks for taking time to look on this!
>>> 
>>> Your point with a hung OVS is really an interesting case.
>>> 
>>> From one hand it’s a possible situation, and from another I guess it’s much 
>>> higher probability for OVS to be busy by some other work rather than to 
>>> hang in a loop. In my installation the first happening sometimes (OVS 
>>> business by real work). The reasons can be different but I can’t say that 
>>> it’s better to drop OF connection in such a moment (possibly triggering ha 
>>> chassis-redirect ports failover).
>>> 
>>> At the same time if we’re talking about L3GW node, which has a gateway 
>>> chassis redirect port, the hung OVS should be detected by other nodes with 
>>> BFD probes (if it’s a ha chassis-redirect port).
>>> And if it’s a normal hypervisor (with just a vif ports), so what can we do 
>>> with it? I guess if we drop OF connection, it won’t make any positive 
>>> changes. Maybe just release memory needed for handling this connection and 
>>> not allocate buffers…? Unfortunately I can’t evaluate this.
>>> 
>>> Moreover, the OVN default is a disabled probing. What can be a possible 
>>> recommended values to set probes if to leave it as is? How should users 
>>> find out that they have to configure this? Currently in docs there is only 
>>> information that this option configures probe interval. But it’s not 
>>> obvious when to configure it and which value to set.
>>> 
>> 
>> In support of this, quoting the original commit that added the probe
>> config knob:
>> 
>> "The main intention of this patch is to fix this recompuation issue for
>> older versions (there by requesting backport), it still would be
>> beneficial with the incremental processing engine."
>> 
>> That was back in 2019:
>> 
>> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
>> 
>> Since then lots of performance improvements landed in OVN
>> (ovn-controller too), we're probably fine.  Checking
>> ovn-org/ovn-kubernetes code I see they were also setting
>> ovn-openflow-probe-interval "just in case" so they probably won't be
>> affected by it getting removed:
>> 
>> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
>> 
>>> I hope this makes sense.
>>> 
>> 
>> The patch in its current form looks OK to me, I'm OK to merge it if Mark
>> doesn't have anything against it; therefore:
>> 
>> Acked-by: Dumitru Ceara <dce...@redhat.com>
>> 
> 
> I forgot to mention this earlier but, since this patch was posted way
> before soft freeze and since the changes seem contained, it's probably
> fair to apply this to the 23.09 branch too.  What do you think, Mark?
> 
> Thanks!
> 
>> Regards,
>> Dumitru
>> 
>>> Also, which Han’s patch to OVS you’re talking about? I looked through open 
>>> OVS patches and haven’t seen any. Please point me out.
>>> 
>>> regards,
>>> Vladislav Odintsov
>>> 
>>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmich...@redhat.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Thanks for linking the email thread, since I had the exact same concern 
>>>> that Numan did about detecting if ovs-vswitchd goes down. It sounds like 
>>>> because of the nature of unix sockets, this disconnection is still 
>>>> detected by OVN and failover can occur as expected.
>>>> 
>>>> But what about if ovs-vswitchd is still running but in a "compromised" 
>>>> state. For instance, what if a coding error in ovs-vswitchd results in it 
>>>> running an infinite loop? In such a case, the unix connection would still 
>>>> be alive, but OVN would not be able to communicate with the ovs-vswitchd 
>>>> process.
>>>> 
>>>> Does this situation eventually lead to OVN disconnecting anyway after it 
>>>> fills up the unix socket's buffer? If so, how long does that take? Or does 
>>>> it lead to OVN slowly growing in memory usage while it waits forever to be 
>>>> able to write to the socket?
>>>> 
>>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c 
>>>> and features.c is a good idea. But I think we should leave the 
>>>> configurable probe interval used by ofctrl.c for the case I described 
>>>> before. Since Han's patch to OVS should disable probe intervals from the 
>>>> OVS side by default, we would not trigger bad behavior simply because OVN 
>>>> has a lot of work to do during a recompute. What do you think?
>>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>>>>> OpenFlow connection is established over unix socket, which is a reliable
>>>>> connection and doesn't require additional probing.
>>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>>>>> direction is disabled for all three connections:
>>>>>  - OF flows management connection,
>>>>>  - OF features negotiation connection,
>>>>>  - pinctrl connection.
>>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
>>>>> non-needed anymore.
>>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>>>>> be done in the next patch.
>>>>> Reported-at: 
>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com>
>>>>> ---
>>>>> NEWS                            |  5 +++++
>>>>> controller/ofctrl.c             | 14 ++------------
>>>>> controller/ofctrl.h             |  4 +---
>>>>> controller/ovn-controller.8.xml | 14 --------------
>>>>> controller/ovn-controller.c     | 21 +--------------------
>>>>> controller/pinctrl.c            |  2 +-
>>>>> lib/features.c                  |  7 ++-----
>>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
>>>>> diff --git a/NEWS b/NEWS
>>>>> index 645acea1f..bd63b187b 100644
>>>>> --- a/NEWS
>>>>> +++ b/NEWS
>>>>> @@ -1,5 +1,10 @@
>>>>> Post v23.06.0
>>>>> -------------
>>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
>>>>> +    OF connection is established over unix socket, which is a reliable
>>>>> +    connection method and doesn't require additional probing.
>>>>> +    external_ids:ovn-openflow-probe-interval configuration option for
>>>>> +    ovn-controller is no longer matter.
>>>>>   OVN v23.06.0 - 01 Jun 2023
>>>>> --------------------------
>>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>>> index 64a444ff6..788634494 100644
>>>>> --- a/controller/ofctrl.c
>>>>> +++ b/controller/ofctrl.c
>>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, 
>>>>> enum ofptype);
>>>>>   void
>>>>> ofctrl_init(struct ovn_extend_table *group_table,
>>>>> -            struct ovn_extend_table *meter_table,
>>>>> -            int inactivity_probe_interval)
>>>>> +            struct ovn_extend_table *meter_table)
>>>>> {
>>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>     tx_counter = rconn_packet_counter_create();
>>>>>     hmap_init(&installed_lflows);
>>>>>     hmap_init(&installed_pflows);
>>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>>>>     return rconn_is_connected(swconn);
>>>>> }
>>>>> -void
>>>>> -ofctrl_set_probe_interval(int probe_interval)
>>>>> -{
>>>>> -    if (swconn) {
>>>>> -        rconn_set_probe_interval(swconn, probe_interval);
>>>>> -    }
>>>>> -}
>>>>> -
>>>>> void
>>>>> ofctrl_get_memory_usage(struct simap *usage)
>>>>> {
>>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>>>>> index 105f9370b..46bfccd85 100644
>>>>> --- a/controller/ofctrl.h
>>>>> +++ b/controller/ofctrl.h
>>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>>>>   /* Interface for OVN main loop. */
>>>>> void ofctrl_init(struct ovn_extend_table *group_table,
>>>>> -                 struct ovn_extend_table *meter_table,
>>>>> -                 int inactivity_probe_interval);
>>>>> +                 struct ovn_extend_table *meter_table);
>>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>>>>                 const struct ovsrec_open_vswitch_table *,
>>>>>                 struct shash *pending_ct_zones);
>>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct 
>>>>> ovn_desired_flow_table *,
>>>>>     bool ofctrl_is_connected(void);
>>>>> -void ofctrl_set_probe_interval(int probe_interval);
>>>>> void ofctrl_get_memory_usage(struct simap *usage);
>>>>>   #endif /* controller/ofctrl.h */
>>>>> diff --git a/controller/ovn-controller.8.xml 
>>>>> b/controller/ovn-controller.8.xml
>>>>> index f61f43008..52eb137d3 100644
>>>>> --- a/controller/ovn-controller.8.xml
>>>>> +++ b/controller/ovn-controller.8.xml
>>>>> @@ -147,20 +147,6 @@
>>>>>         </p>
>>>>>       </dd>
>>>>> -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>>>>> -      <dd>
>>>>> -        <p>
>>>>> -          The inactivity probe interval of the OpenFlow connection to the
>>>>> -          OpenvSwitch integration bridge, in seconds.
>>>>> -          If the value is zero, it disables the connection keepalive 
>>>>> feature.
>>>>> -        </p>
>>>>> -
>>>>> -        <p>
>>>>> -          If the value is nonzero, then it will be forced to a value of
>>>>> -          at least 5s.
>>>>> -        </p>
>>>>> -      </dd>
>>>>> -
>>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
>>>>>       <dd>
>>>>>         <p>
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index 3a81a13fb..732e7a690 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
>>>>> #define DEFAULT_DATAPATH "system"
>>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table 
>>>>> *ssl_table)
>>>>>     }
>>>>> }
>>>>> -static int
>>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>>>>> -{
>>>>> -    const struct ovsrec_open_vswitch *cfg = 
>>>>> ovsrec_open_vswitch_first(ovs_idl);
>>>>> -    if (!cfg) {
>>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>>>>> -    }
>>>>> -
>>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>> -    return get_chassis_external_id_value_int(
>>>>> -        &cfg->external_ids, chassis_id,
>>>>> -        "ovn-openflow-probe-interval", 
>>>>> OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>>>>> -}
>>>>> -
>>>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>>>>>  * updates 'sbdb_idl' with that pointer. */
>>>>> static void
>>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>>>>         engine_get_internal_data(&en_lb_data);
>>>>>       ofctrl_init(&lflow_output_data->group_table,
>>>>> -                &lflow_output_data->meter_table,
>>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>> +                &lflow_output_data->meter_table);
>>>>>     ofctrl_seqno_init();
>>>>>       unixctl_command_register("group-table-list", "", 0, 0,
>>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>>>>                      &reset_ovnsb_idl_min_index,
>>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>>>>> -        
>>>>> ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
>>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>>>>> index c396ad4c2..a86db3f32 100644
>>>>> --- a/controller/pinctrl.c
>>>>> +++ b/controller/pinctrl.c
>>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
>>>>>     static long long int send_prefixd_time = LLONG_MAX;
>>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>>>>         long long int bfd_time = LLONG_MAX;
>>>>> diff --git a/lib/features.c b/lib/features.c
>>>>> index 97c9c86f0..ad3357570 100644
>>>>> --- a/lib/features.c
>>>>> +++ b/lib/features.c
>>>>> @@ -28,11 +28,10 @@
>>>>> #include "openvswitch/ofp-meter.h"
>>>>> #include "openvswitch/ofp-util.h"
>>>>> #include "ovn/features.h"
>>>>> +#include "controller/ofctrl.h"
>>>>>   VLOG_DEFINE_THIS_MODULE(features);
>>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>>>>> -
>>>>> struct ovs_feature {
>>>>>     enum ovs_feature_value value;
>>>>>     const char *name;
>>>>> @@ -82,8 +81,7 @@ static void
>>>>> ovs_feature_rconn_setup(const char *br_name)
>>>>> {
>>>>>     if (!swconn) {
>>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
>>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>     }
>>>>>       if (!rconn_is_connected(swconn)) {
>>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>>>>         }
>>>>>         free(target);
>>>>>     }
>>>>> -    rconn_set_probe_interval(swconn, 
>>>>> FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>>>> }
>>>>>   static bool
>>>> 
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to