On Fri, Sep 22, 2023 at 2:40 PM Han Zhou <hz...@ovn.org> wrote:
>
> On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odiv...@gmail.com>
> wrote:
> >
> > Hi Dumitru,
> >
> > if eventually this patch got merged, please remove next lines from its
> commit message:
> >
> > "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
> > be done in the next patch."
> >
> > Unfortunately I was unsuccessful in removing these probing direction
> (default 60s).
> > Maybe you can give any hint how to solve this [1].
> >
> > 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html
> >
> > > On 11 Sep 2023, at 15:57, Dumitru Ceara <dce...@redhat.com> wrote:
> > >
> > > On 9/11/23 14:54, Dumitru Ceara wrote:
> > >> On 6/14/23 20:02, Vladislav Odintsov wrote:
> > >>> Hi Mark,
> > >>>
> > >>> thanks for taking time to look on this!
> > >>>
> > >>> Your point with a hung OVS is really an interesting case.
> > >>>
> > >>> From one hand it’s a possible situation, and from another I guess
> it’s much higher probability for OVS to be busy by some other work rather
> than to hang in a loop. In my installation the first happening sometimes
> (OVS business by real work). The reasons can be different but I can’t say
> that it’s better to drop OF connection in such a moment (possibly
> triggering ha chassis-redirect ports failover).
> > >>>
> > >>> At the same time if we’re talking about L3GW node, which has a
> gateway chassis redirect port, the hung OVS should be detected by other
> nodes with BFD probes (if it’s a ha chassis-redirect port).
> > >>> And if it’s a normal hypervisor (with just a vif ports), so what can
> we do with it? I guess if we drop OF connection, it won’t make any positive
> changes. Maybe just release memory needed for handling this connection and
> not allocate buffers…? Unfortunately I can’t evaluate this.
> > >>>
> > >>> Moreover, the OVN default is a disabled probing. What can be a
> possible recommended values to set probes if to leave it as is? How should
> users find out that they have to configure this? Currently in docs there is
> only information that this option configures probe interval. But it’s not
> obvious when to configure it and which value to set.
> > >>>
> > >>
> > >> In support of this, quoting the original commit that added the probe
> > >> config knob:
> > >>
> > >> "The main intention of this patch is to fix this recompuation issue for
> > >> older versions (there by requesting backport), it still would be
> > >> beneficial with the incremental processing engine."
> > >>
> > >> That was back in 2019:
> > >>
> > >>
> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
>
> To clarify a little more, the effect of the above commit in fact changed
> the default behavior of probe from disabled (which is the default behavior
> of all unix/punix rconn) to 5s, and was changed back to default disabled by:
> https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783
>
> So I am totally fine to remove this config if no one would ever need to
> enable the probe.
>
> > >>
> > >> Since then lots of performance improvements landed in OVN
> > >> (ovn-controller too), we're probably fine.  Checking
> > >> ovn-org/ovn-kubernetes code I see they were also setting
> > >> ovn-openflow-probe-interval "just in case" so they probably won't be
> > >> affected by it getting removed:
> > >>
> > >>
> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
> > >>
> > >>> I hope this makes sense.
> > >>>
> > >>
> > >> The patch in its current form looks OK to me, I'm OK to merge it if
> Mark
> > >> doesn't have anything against it; therefore:
> > >>
> > >> Acked-by: Dumitru Ceara <dce...@redhat.com>
> > >>
> > >
> > > I forgot to mention this earlier but, since this patch was posted way
> > > before soft freeze and since the changes seem contained, it's probably
> > > fair to apply this to the 23.09 branch too.  What do you think, Mark?
> > >
> > > Thanks!
> > >
> > >> Regards,
> > >> Dumitru
> > >>
> > >>> Also, which Han’s patch to OVS you’re talking about? I looked through
> open OVS patches and haven’t seen any. Please point me out.
> > >>>
>
> I don't remember that I posted any patch related to this. We have a patch
> downstream that changed the hardcoded 60s to 0 in OVS for the mgmt
> controller.
>
> > >>> regards,
> > >>> Vladislav Odintsov
> > >>>
> > >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmich...@redhat.com> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> Thanks for linking the email thread, since I had the exact same
> concern that Numan did about detecting if ovs-vswitchd goes down. It sounds
> like because of the nature of unix sockets, this disconnection is still
> detected by OVN and failover can occur as expected.
> > >>>>
> > >>>> But what about if ovs-vswitchd is still running but in a
> "compromised" state. For instance, what if a coding error in ovs-vswitchd
> results in it running an infinite loop? In such a case, the unix connection
> would still be alive, but OVN would not be able to communicate with the
> ovs-vswitchd process.
> > >>>>
> > >>>> Does this situation eventually lead to OVN disconnecting anyway
> after it fills up the unix socket's buffer? If so, how long does that take?
> Or does it lead to OVN slowly growing in memory usage while it waits
> forever to be able to write to the socket?
>
> Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller
> and reconnect would not help anyway, right?
>
> > >>>>
> > >>>> I think removing the hard-coded 5 second probe intervals from
> pinctrl.c and features.c is a good idea. But I think we should leave the
> configurable probe interval used by ofctrl.c for the case I described
> before. Since Han's patch to OVS should disable probe intervals from the
> OVS side by default, we would not trigger bad behavior simply because OVN
> has a lot of work to do during a recompute. What do you think?
> > >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
> > >>>>> OpenFlow connection is established over unix socket, which is a
> reliable
> > >>>>> connection and doesn't require additional probing.
> > >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
> > >>>>> direction is disabled for all three connections:
> > >>>>>  - OF flows management connection,
> > >>>>>  - OF features negotiation connection,
> > >>>>>  - pinctrl connection.
> > >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed
> as
> > >>>>> non-needed anymore.
> > >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing
> will
> > >>>>> be done in the next patch.
> > >>>>> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
> > >>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com>
> > >>>>> ---
> > >>>>> NEWS                            |  5 +++++
> > >>>>> controller/ofctrl.c             | 14 ++------------
> > >>>>> controller/ofctrl.h             |  4 +---
> > >>>>> controller/ovn-controller.8.xml | 14 --------------
> > >>>>> controller/ovn-controller.c     | 21 +--------------------
> > >>>>> controller/pinctrl.c            |  2 +-
> > >>>>> lib/features.c                  |  7 ++-----
> > >>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
> > >>>>> diff --git a/NEWS b/NEWS
> > >>>>> index 645acea1f..bd63b187b 100644
> > >>>>> --- a/NEWS
> > >>>>> +++ b/NEWS
> > >>>>> @@ -1,5 +1,10 @@
> > >>>>> Post v23.06.0
> > >>>>> -------------
> > >>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and
> OVS.
> > >>>>> +    OF connection is established over unix socket, which is a
> reliable
> > >>>>> +    connection method and doesn't require additional probing.
> > >>>>> +    external_ids:ovn-openflow-probe-interval configuration option
> for
> > >>>>> +    ovn-controller is no longer matter.
>
> nit: s/is no longer matter/is removed
>
> Acked-by: Han Zhou <hz...@ovn.org>
>
> (waiting for Mark's confirm about this concerns)

OK.  I went ahead and applied this patch to the main.  Since there
were no comments from Mark I assume he is fine with the patch.

Numan

>
> > >>>>>   OVN v23.06.0 - 01 Jun 2023
> > >>>>> --------------------------
> > >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > >>>>> index 64a444ff6..788634494 100644
> > >>>>> --- a/controller/ofctrl.c
> > >>>>> +++ b/controller/ofctrl.c
> > >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct
> ofp_header *, enum ofptype);
> > >>>>>   void
> > >>>>> ofctrl_init(struct ovn_extend_table *group_table,
> > >>>>> -            struct ovn_extend_table *meter_table,
> > >>>>> -            int inactivity_probe_interval)
> > >>>>> +            struct ovn_extend_table *meter_table)
> > >>>>> {
> > >>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
> > >>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>>     tx_counter = rconn_packet_counter_create();
> > >>>>>     hmap_init(&installed_lflows);
> > >>>>>     hmap_init(&installed_pflows);
> > >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
> > >>>>>     return rconn_is_connected(swconn);
> > >>>>> }
> > >>>>> -void
> > >>>>> -ofctrl_set_probe_interval(int probe_interval)
> > >>>>> -{
> > >>>>> -    if (swconn) {
> > >>>>> -        rconn_set_probe_interval(swconn, probe_interval);
> > >>>>> -    }
> > >>>>> -}
> > >>>>> -
> > >>>>> void
> > >>>>> ofctrl_get_memory_usage(struct simap *usage)
> > >>>>> {
> > >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> > >>>>> index 105f9370b..46bfccd85 100644
> > >>>>> --- a/controller/ofctrl.h
> > >>>>> +++ b/controller/ofctrl.h
> > >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
> > >>>>>   /* Interface for OVN main loop. */
> > >>>>> void ofctrl_init(struct ovn_extend_table *group_table,
> > >>>>> -                 struct ovn_extend_table *meter_table,
> > >>>>> -                 int inactivity_probe_interval);
> > >>>>> +                 struct ovn_extend_table *meter_table);
> > >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
> > >>>>>                 const struct ovsrec_open_vswitch_table *,
> > >>>>>                 struct shash *pending_ct_zones);
> > >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct
> ovn_desired_flow_table *,
> > >>>>>     bool ofctrl_is_connected(void);
> > >>>>> -void ofctrl_set_probe_interval(int probe_interval);
> > >>>>> void ofctrl_get_memory_usage(struct simap *usage);
> > >>>>>   #endif /* controller/ofctrl.h */
> > >>>>> diff --git a/controller/ovn-controller.8.xml
> b/controller/ovn-controller.8.xml
> > >>>>> index f61f43008..52eb137d3 100644
> > >>>>> --- a/controller/ovn-controller.8.xml
> > >>>>> +++ b/controller/ovn-controller.8.xml
> > >>>>> @@ -147,20 +147,6 @@
> > >>>>>         </p>
> > >>>>>       </dd>
> > >>>>> -
>  <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
> > >>>>> -      <dd>
> > >>>>> -        <p>
> > >>>>> -          The inactivity probe interval of the OpenFlow connection
> to the
> > >>>>> -          OpenvSwitch integration bridge, in seconds.
> > >>>>> -          If the value is zero, it disables the connection
> keepalive feature.
> > >>>>> -        </p>
> > >>>>> -
> > >>>>> -        <p>
> > >>>>> -          If the value is nonzero, then it will be forced to a
> value of
> > >>>>> -          at least 5s.
> > >>>>> -        </p>
> > >>>>> -      </dd>
> > >>>>> -
> > >>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
> > >>>>>       <dd>
> > >>>>>         <p>
> > >>>>> diff --git a/controller/ovn-controller.c
> b/controller/ovn-controller.c
> > >>>>> index 3a81a13fb..732e7a690 100644
> > >>>>> --- a/controller/ovn-controller.c
> > >>>>> +++ b/controller/ovn-controller.c
> > >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func
> debug_ignore_startup_delay;
> > >>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
> > >>>>> #define DEFAULT_DATAPATH "system"
> > >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> > >>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
> > >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
> > >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct
> ovsrec_ssl_table *ssl_table)
> > >>>>>     }
> > >>>>> }
> > >>>>> -static int
> > >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
> > >>>>> -{
> > >>>>> -    const struct ovsrec_open_vswitch *cfg =
> ovsrec_open_vswitch_first(ovs_idl);
> > >>>>> -    if (!cfg) {
> > >>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
> > >>>>> -    }
> > >>>>> -
> > >>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
> > >>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
> > >>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > >>>>> -    return get_chassis_external_id_value_int(
> > >>>>> -        &cfg->external_ids, chassis_id,
> > >>>>> -        "ovn-openflow-probe-interval",
> OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
> > >>>>> -}
> > >>>>> -
> > >>>>> /* Retrieves the pointer to the OVN Southbound database from
> 'ovs_idl' and
> > >>>>>  * updates 'sbdb_idl' with that pointer. */
> > >>>>> static void
> > >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
> > >>>>>         engine_get_internal_data(&en_lb_data);
> > >>>>>       ofctrl_init(&lflow_output_data->group_table,
> > >>>>> -                &lflow_output_data->meter_table,
> > >>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
> > >>>>> +                &lflow_output_data->meter_table);
> > >>>>>     ofctrl_seqno_init();
> > >>>>>       unixctl_command_register("group-table-list", "", 0, 0,
> > >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
> > >>>>>                      &reset_ovnsb_idl_min_index,
> > >>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
> > >>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
> > >>>>> -
>  ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
> > >>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
> > >>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
> > >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > >>>>> index c396ad4c2..a86db3f32 100644
> > >>>>> --- a/controller/pinctrl.c
> > >>>>> +++ b/controller/pinctrl.c
> > >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
> > >>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
> > >>>>>     static long long int send_prefixd_time = LLONG_MAX;
> > >>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
> > >>>>>         long long int bfd_time = LLONG_MAX;
> > >>>>> diff --git a/lib/features.c b/lib/features.c
> > >>>>> index 97c9c86f0..ad3357570 100644
> > >>>>> --- a/lib/features.c
> > >>>>> +++ b/lib/features.c
> > >>>>> @@ -28,11 +28,10 @@
> > >>>>> #include "openvswitch/ofp-meter.h"
> > >>>>> #include "openvswitch/ofp-util.h"
> > >>>>> #include "ovn/features.h"
> > >>>>> +#include "controller/ofctrl.h"
> > >>>>>   VLOG_DEFINE_THIS_MODULE(features);
> > >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> > >>>>> -
> > >>>>> struct ovs_feature {
> > >>>>>     enum ovs_feature_value value;
> > >>>>>     const char *name;
> > >>>>> @@ -82,8 +81,7 @@ static void
> > >>>>> ovs_feature_rconn_setup(const char *br_name)
> > >>>>> {
> > >>>>>     if (!swconn) {
> > >>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC,
> 0,
> > >>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 <<
> OFP15_VERSION);
> > >>>>>     }
> > >>>>>       if (!rconn_is_connected(swconn)) {
> > >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
> > >>>>>         }
> > >>>>>         free(target);
> > >>>>>     }
> > >>>>> -    rconn_set_probe_interval(swconn,
> FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
> > >>>>> }
> > >>>>>   static bool
> > >>>>
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> d...@openvswitch.org
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> > Regards,
> > Vladislav Odintsov
> >
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to