On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odiv...@gmail.com>
wrote:
>
> Hi Dumitru,
>
> if eventually this patch got merged, please remove next lines from its
commit message:
>
> "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
> be done in the next patch."
>
> Unfortunately I was unsuccessful in removing these probing direction
(default 60s).
> Maybe you can give any hint how to solve this [1].
>
> 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html
>
> > On 11 Sep 2023, at 15:57, Dumitru Ceara <dce...@redhat.com> wrote:
> >
> > On 9/11/23 14:54, Dumitru Ceara wrote:
> >> On 6/14/23 20:02, Vladislav Odintsov wrote:
> >>> Hi Mark,
> >>>
> >>> thanks for taking time to look on this!
> >>>
> >>> Your point with a hung OVS is really an interesting case.
> >>>
> >>> From one hand it’s a possible situation, and from another I guess
it’s much higher probability for OVS to be busy by some other work rather
than to hang in a loop. In my installation the first happening sometimes
(OVS business by real work). The reasons can be different but I can’t say
that it’s better to drop OF connection in such a moment (possibly
triggering ha chassis-redirect ports failover).
> >>>
> >>> At the same time if we’re talking about L3GW node, which has a
gateway chassis redirect port, the hung OVS should be detected by other
nodes with BFD probes (if it’s a ha chassis-redirect port).
> >>> And if it’s a normal hypervisor (with just a vif ports), so what can
we do with it? I guess if we drop OF connection, it won’t make any positive
changes. Maybe just release memory needed for handling this connection and
not allocate buffers…? Unfortunately I can’t evaluate this.
> >>>
> >>> Moreover, the OVN default is a disabled probing. What can be a
possible recommended values to set probes if to leave it as is? How should
users find out that they have to configure this? Currently in docs there is
only information that this option configures probe interval. But it’s not
obvious when to configure it and which value to set.
> >>>
> >>
> >> In support of this, quoting the original commit that added the probe
> >> config knob:
> >>
> >> "The main intention of this patch is to fix this recompuation issue for
> >> older versions (there by requesting backport), it still would be
> >> beneficial with the incremental processing engine."
> >>
> >> That was back in 2019:
> >>
> >>
https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589

To clarify a little more, the effect of the above commit in fact changed
the default behavior of probe from disabled (which is the default behavior
of all unix/punix rconn) to 5s, and was changed back to default disabled by:
https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783

So I am totally fine to remove this config if no one would ever need to
enable the probe.

> >>
> >> Since then lots of performance improvements landed in OVN
> >> (ovn-controller too), we're probably fine.  Checking
> >> ovn-org/ovn-kubernetes code I see they were also setting
> >> ovn-openflow-probe-interval "just in case" so they probably won't be
> >> affected by it getting removed:
> >>
> >>
https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
> >>
> >>> I hope this makes sense.
> >>>
> >>
> >> The patch in its current form looks OK to me, I'm OK to merge it if
Mark
> >> doesn't have anything against it; therefore:
> >>
> >> Acked-by: Dumitru Ceara <dce...@redhat.com>
> >>
> >
> > I forgot to mention this earlier but, since this patch was posted way
> > before soft freeze and since the changes seem contained, it's probably
> > fair to apply this to the 23.09 branch too.  What do you think, Mark?
> >
> > Thanks!
> >
> >> Regards,
> >> Dumitru
> >>
> >>> Also, which Han’s patch to OVS you’re talking about? I looked through
open OVS patches and haven’t seen any. Please point me out.
> >>>

I don't remember that I posted any patch related to this. We have a patch
downstream that changed the hardcoded 60s to 0 in OVS for the mgmt
controller.

> >>> regards,
> >>> Vladislav Odintsov
> >>>
> >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmich...@redhat.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Thanks for linking the email thread, since I had the exact same
concern that Numan did about detecting if ovs-vswitchd goes down. It sounds
like because of the nature of unix sockets, this disconnection is still
detected by OVN and failover can occur as expected.
> >>>>
> >>>> But what about if ovs-vswitchd is still running but in a
"compromised" state. For instance, what if a coding error in ovs-vswitchd
results in it running an infinite loop? In such a case, the unix connection
would still be alive, but OVN would not be able to communicate with the
ovs-vswitchd process.
> >>>>
> >>>> Does this situation eventually lead to OVN disconnecting anyway
after it fills up the unix socket's buffer? If so, how long does that take?
Or does it lead to OVN slowly growing in memory usage while it waits
forever to be able to write to the socket?

Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller
and reconnect would not help anyway, right?

> >>>>
> >>>> I think removing the hard-coded 5 second probe intervals from
pinctrl.c and features.c is a good idea. But I think we should leave the
configurable probe interval used by ofctrl.c for the case I described
before. Since Han's patch to OVS should disable probe intervals from the
OVS side by default, we would not trigger bad behavior simply because OVN
has a lot of work to do during a recompute. What do you think?
> >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
> >>>>> OpenFlow connection is established over unix socket, which is a
reliable
> >>>>> connection and doesn't require additional probing.
> >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
> >>>>> direction is disabled for all three connections:
> >>>>>  - OF flows management connection,
> >>>>>  - OF features negotiation connection,
> >>>>>  - pinctrl connection.
> >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed
as
> >>>>> non-needed anymore.
> >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing
will
> >>>>> be done in the next patch.
> >>>>> Reported-at:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
> >>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com>
> >>>>> ---
> >>>>> NEWS                            |  5 +++++
> >>>>> controller/ofctrl.c             | 14 ++------------
> >>>>> controller/ofctrl.h             |  4 +---
> >>>>> controller/ovn-controller.8.xml | 14 --------------
> >>>>> controller/ovn-controller.c     | 21 +--------------------
> >>>>> controller/pinctrl.c            |  2 +-
> >>>>> lib/features.c                  |  7 ++-----
> >>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
> >>>>> diff --git a/NEWS b/NEWS
> >>>>> index 645acea1f..bd63b187b 100644
> >>>>> --- a/NEWS
> >>>>> +++ b/NEWS
> >>>>> @@ -1,5 +1,10 @@
> >>>>> Post v23.06.0
> >>>>> -------------
> >>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and
OVS.
> >>>>> +    OF connection is established over unix socket, which is a
reliable
> >>>>> +    connection method and doesn't require additional probing.
> >>>>> +    external_ids:ovn-openflow-probe-interval configuration option
for
> >>>>> +    ovn-controller is no longer matter.

nit: s/is no longer matter/is removed

Acked-by: Han Zhou <hz...@ovn.org>

(waiting for Mark's confirm about this concerns)

> >>>>>   OVN v23.06.0 - 01 Jun 2023
> >>>>> --------------------------
> >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >>>>> index 64a444ff6..788634494 100644
> >>>>> --- a/controller/ofctrl.c
> >>>>> +++ b/controller/ofctrl.c
> >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct
ofp_header *, enum ofptype);
> >>>>>   void
> >>>>> ofctrl_init(struct ovn_extend_table *group_table,
> >>>>> -            struct ovn_extend_table *meter_table,
> >>>>> -            int inactivity_probe_interval)
> >>>>> +            struct ovn_extend_table *meter_table)
> >>>>> {
> >>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
> >>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>>     tx_counter = rconn_packet_counter_create();
> >>>>>     hmap_init(&installed_lflows);
> >>>>>     hmap_init(&installed_pflows);
> >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
> >>>>>     return rconn_is_connected(swconn);
> >>>>> }
> >>>>> -void
> >>>>> -ofctrl_set_probe_interval(int probe_interval)
> >>>>> -{
> >>>>> -    if (swconn) {
> >>>>> -        rconn_set_probe_interval(swconn, probe_interval);
> >>>>> -    }
> >>>>> -}
> >>>>> -
> >>>>> void
> >>>>> ofctrl_get_memory_usage(struct simap *usage)
> >>>>> {
> >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> >>>>> index 105f9370b..46bfccd85 100644
> >>>>> --- a/controller/ofctrl.h
> >>>>> +++ b/controller/ofctrl.h
> >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
> >>>>>   /* Interface for OVN main loop. */
> >>>>> void ofctrl_init(struct ovn_extend_table *group_table,
> >>>>> -                 struct ovn_extend_table *meter_table,
> >>>>> -                 int inactivity_probe_interval);
> >>>>> +                 struct ovn_extend_table *meter_table);
> >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
> >>>>>                 const struct ovsrec_open_vswitch_table *,
> >>>>>                 struct shash *pending_ct_zones);
> >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct
ovn_desired_flow_table *,
> >>>>>     bool ofctrl_is_connected(void);
> >>>>> -void ofctrl_set_probe_interval(int probe_interval);
> >>>>> void ofctrl_get_memory_usage(struct simap *usage);
> >>>>>   #endif /* controller/ofctrl.h */
> >>>>> diff --git a/controller/ovn-controller.8.xml
b/controller/ovn-controller.8.xml
> >>>>> index f61f43008..52eb137d3 100644
> >>>>> --- a/controller/ovn-controller.8.xml
> >>>>> +++ b/controller/ovn-controller.8.xml
> >>>>> @@ -147,20 +147,6 @@
> >>>>>         </p>
> >>>>>       </dd>
> >>>>> -
 <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
> >>>>> -      <dd>
> >>>>> -        <p>
> >>>>> -          The inactivity probe interval of the OpenFlow connection
to the
> >>>>> -          OpenvSwitch integration bridge, in seconds.
> >>>>> -          If the value is zero, it disables the connection
keepalive feature.
> >>>>> -        </p>
> >>>>> -
> >>>>> -        <p>
> >>>>> -          If the value is nonzero, then it will be forced to a
value of
> >>>>> -          at least 5s.
> >>>>> -        </p>
> >>>>> -      </dd>
> >>>>> -
> >>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
> >>>>>       <dd>
> >>>>>         <p>
> >>>>> diff --git a/controller/ovn-controller.c
b/controller/ovn-controller.c
> >>>>> index 3a81a13fb..732e7a690 100644
> >>>>> --- a/controller/ovn-controller.c
> >>>>> +++ b/controller/ovn-controller.c
> >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func
debug_ignore_startup_delay;
> >>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
> >>>>> #define DEFAULT_DATAPATH "system"
> >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> >>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
> >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
> >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct
ovsrec_ssl_table *ssl_table)
> >>>>>     }
> >>>>> }
> >>>>> -static int
> >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
> >>>>> -{
> >>>>> -    const struct ovsrec_open_vswitch *cfg =
ovsrec_open_vswitch_first(ovs_idl);
> >>>>> -    if (!cfg) {
> >>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
> >>>>> -    }
> >>>>> -
> >>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
> >>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
> >>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>>>> -    return get_chassis_external_id_value_int(
> >>>>> -        &cfg->external_ids, chassis_id,
> >>>>> -        "ovn-openflow-probe-interval",
OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
> >>>>> -}
> >>>>> -
> >>>>> /* Retrieves the pointer to the OVN Southbound database from
'ovs_idl' and
> >>>>>  * updates 'sbdb_idl' with that pointer. */
> >>>>> static void
> >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
> >>>>>         engine_get_internal_data(&en_lb_data);
> >>>>>       ofctrl_init(&lflow_output_data->group_table,
> >>>>> -                &lflow_output_data->meter_table,
> >>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
> >>>>> +                &lflow_output_data->meter_table);
> >>>>>     ofctrl_seqno_init();
> >>>>>       unixctl_command_register("group-table-list", "", 0, 0,
> >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
> >>>>>                      &reset_ovnsb_idl_min_index,
> >>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
> >>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
> >>>>> -
 ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
> >>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
> >>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
> >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> >>>>> index c396ad4c2..a86db3f32 100644
> >>>>> --- a/controller/pinctrl.c
> >>>>> +++ b/controller/pinctrl.c
> >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
> >>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
> >>>>>     static long long int send_prefixd_time = LLONG_MAX;
> >>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
> >>>>>         long long int bfd_time = LLONG_MAX;
> >>>>> diff --git a/lib/features.c b/lib/features.c
> >>>>> index 97c9c86f0..ad3357570 100644
> >>>>> --- a/lib/features.c
> >>>>> +++ b/lib/features.c
> >>>>> @@ -28,11 +28,10 @@
> >>>>> #include "openvswitch/ofp-meter.h"
> >>>>> #include "openvswitch/ofp-util.h"
> >>>>> #include "ovn/features.h"
> >>>>> +#include "controller/ofctrl.h"
> >>>>>   VLOG_DEFINE_THIS_MODULE(features);
> >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> >>>>> -
> >>>>> struct ovs_feature {
> >>>>>     enum ovs_feature_value value;
> >>>>>     const char *name;
> >>>>> @@ -82,8 +81,7 @@ static void
> >>>>> ovs_feature_rconn_setup(const char *br_name)
> >>>>> {
> >>>>>     if (!swconn) {
> >>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC,
0,
> >>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 <<
OFP15_VERSION);
> >>>>>     }
> >>>>>       if (!rconn_is_connected(swconn)) {
> >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
> >>>>>         }
> >>>>>         free(target);
> >>>>>     }
> >>>>> -    rconn_set_probe_interval(swconn,
FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
> >>>>> }
> >>>>>   static bool
> >>>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
> Regards,
> Vladislav Odintsov
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to