Re: [ovs-dev] [PATCH ovn branch-21.12 2/4] ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.

2022-08-04 Thread 0-day Robot
Bleep bloop.  Greetings Mark Michelson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 84 characters long (recommended limit is 79)
#321 FILE: controller/ovn-controller.8.xml:289:
ovn-appctl -t ovn-controller stopwatch/show 
flow-generation

Lines checked: 439, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-21.12 2/4] ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.

2022-08-04 Thread Mark Michelson
From: Han Zhou 

Whenever OpenFlow connection between ovn-controller and OVS is
connected/reconnected, typically during ovn-controller/OVS
restart/upgrade, ovn-controller would:
1. clears all the existing flows after the initial hand-shaking
2. compute the new flows
3. install the new flows to OVS.

In large scale environments when there are a big number of flows
needs to be computed by ovn-controller, the step 2 and 3 above can take
very long time (from seconds to minutes, depending on the scale and
topology), which would cause a data plane down time.

The purpose of this patch is to reduce data plane down time during
ovn-controller restart/upgrade. It adds a new state S_WAIT_BEFORE_CLEAR
to the ofctrl state machine, so that ovn-controller waits for a period
of time before clearing the existing OVS flows while it is computing the
new flows. Ideally, ovn-controller should clear the flows after it
completes the new flows computing. However, it is difficult for
ovn-controller to judge if it has generated all the new flows (or the
flows that are sufficient to avoid down time), because flows computation
depends on iterations of SB monitor data processing and condition
changes. So, instead of try to determine by ovn-controller itself, this
patch provides a configuration "ovn-ofctrl-wait-before-clear" for users
to determine the feasible time to wait, according to the scale. As
mentioned in the document, the output of

  ovn-appctl -t ovn-controller inc-engine/recompute
  ovn-appctl -t ovn-controller stopwatch/show flow-generation

can help users determining what value to set.

Tested with ~200k ovs flows generated by ovn-controller on a system with 8
Armv8 A72 cores, with ovn-ofctrl-wait-before-clear set to 7000 (the
stopwatch/show flow-generation Maximum is 3596ms).

Without this patch, there were ~13 seconds ping failures during ovn-controller
restart.

With this patch, the ping failure reduced to ~6 seconds.
(The ~6 seconds down time was introduced by step 3 - install the new
flows to ovs, which will be addressed in future patches)

Signed-off-by: Han Zhou 
---
 controller/ofctrl.c | 87 -
 controller/ofctrl.h |  4 +-
 controller/ovn-controller.8.xml | 34 +
 controller/ovn-controller.c |  6 +--
 tests/ovn-controller.at | 55 -
 5 files changed, 168 insertions(+), 18 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 72707baf9..ecabbcc9b 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -47,6 +47,7 @@
 #include "physical.h"
 #include "openvswitch/rconn.h"
 #include "socket-util.h"
+#include "timeval.h"
 #include "util.h"
 #include "vswitch-idl.h"
 
@@ -272,6 +273,7 @@ static unsigned int seqno;
 STATE(S_NEW)\
 STATE(S_TLV_TABLE_REQUESTED)\
 STATE(S_TLV_TABLE_MOD_SENT) \
+STATE(S_WAIT_BEFORE_CLEAR)  \
 STATE(S_CLEAR_FLOWS)\
 STATE(S_UPDATE_FLOWS)
 enum ofctrl_state {
@@ -314,6 +316,14 @@ static uint64_t cur_cfg;
 /* Current state. */
 static enum ofctrl_state state;
 
+/* The time (ms) to stay in the state S_WAIT_BEFORE_CLEAR. Read from
+ * external_ids: ovn-ofctrl-wait-before-clear. */
+static unsigned int wait_before_clear_time = 0;
+
+/* The time when the state S_WAIT_BEFORE_CLEAR should complete.
+ * If the timer is not started yet, it is set to 0. */
+static long long int wait_before_clear_expire = 0;
+
 /* Transaction IDs for messages in flight to the switch. */
 static ovs_be32 xid, xid2;
 
@@ -419,18 +429,19 @@ recv_S_NEW(const struct ofp_header *oh OVS_UNUSED,
  * If we receive an NXT_TLV_TABLE_REPLY:
  *
  * - If it contains our tunnel metadata option, assign its field ID to
- *   mff_ovn_geneve and transition to S_CLEAR_FLOWS.
+ *   mff_ovn_geneve and transition to S_WAIT_BEFORE_CLEAR.
  *
  * - Otherwise, if there is an unused tunnel metadata field ID, send
  *   NXT_TLV_TABLE_MOD and OFPT_BARRIER_REQUEST, and transition to
  *   S_TLV_TABLE_MOD_SENT.
  *
  * - Otherwise, log an error, disable Geneve, and transition to
- *   S_CLEAR_FLOWS.
+ *   S_WAIT_BEFORE_CLEAR.
  *
  * If we receive an OFPT_ERROR:
  *
- * - Log an error, disable Geneve, and transition to S_CLEAR_FLOWS. */
+ * - Log an error, disable Geneve, and transition to S_WAIT_BEFORE_CLEAR.
+ */
 
 static void
 run_S_TLV_TABLE_REQUESTED(void)
@@ -457,7 +468,7 @@ process_tlv_table_reply(const struct 
ofputil_tlv_table_reply *reply)
 return false;
 } else {
 mff_ovn_geneve = MFF_TUN_METADATA0 + map->index;
-state = S_CLEAR_FLOWS;
+state = S_WAIT_BEFORE_CLEAR;
 return true;
 }
 }
@@ -524,7 +535,7 @@ recv_S_TLV_TABLE_REQUESTED(const struct ofp_header *oh, 
enum ofptype type,
 
 /* Error path. */
 mff_ovn_geneve = 0;
-