On Wed, Aug 5, 2020 at 5:36 PM Han Zhou <zhou...@gmail.com> wrote: > > > On Wed, Aug 5, 2020 at 4:21 PM Girish Moodalbail <gmoodalb...@gmail.com> > wrote: > >> >> >> On Wed, Aug 5, 2020 at 3:35 PM Han Zhou <zhou...@gmail.com> wrote: >> >>> >>> >>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windson.w...@gmail.com> >>> wrote: >>> >>>> Hello OVN Experts, >>>> >>>> With ovn-k8s, we need to keep the flows always on br-int which needed >>>> by running pods on the k8s node. >>>> Is there an ongoing project to address this problem? >>>> If not, I have one proposal not sure if it is doable. >>>> Please share your thoughts. >>>> The issue: >>>> >>>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on >>>> every K8s node. When we restart ovn-controller for upgrade using >>>> `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still >>>> works fine since br-int with flows still be Installed. >>>> >>>> However, when a new ovn-controller starts it will connect OVS IDL and >>>> do an engine init run, clearing all OpenFlow flows and install flows based >>>> on SB DB. >>>> >>>> With open flows count above 200K+, it took more than 15 seconds to get >>>> all the flows installed br-int bridge again. >>>> >>>> Proposal solution for the issue: >>>> >>>> When the ovn-controller gets “exit --start”, it will write a >>>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in >>>> external-ids column. When new ovn-controller starts, it will check if the >>>> “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from >>>> OVS IDL to decide if it will force a recomputing process? >>>> >>>> >>> Hi Winson, >>> >>> Thanks for the proposal. Yes, the connection break during upgrading is a >>> real issue in a large scale environment. However, the proposal doesn't >>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB, >>> which is a completely different connection from the ovs-vswitchd open-flow >>> connection. >>> To avoid clearing the open-flow table during ovn-controller startup, we >>> can find a way to postpone clearing the OVS flows after the recomputing in >>> ovn-controller is completed, right before ovn-controller replacing with the >>> new flows. This should largely reduce the time of connection broken during >>> upgrading. Some changes in the ofctrl module's state machine are required, >>> but I am not 100% sure if this approach is applicable. Need to check more >>> details. >>> >>> >> Thanks Han. Yes, postponing clearing of OpenFlow flows until all of the >> logical flows have been translated to OpenFlows will reduce the connection >> downtime. The question though is that can we use 'replace-flows' or >> 'mod-flows equivalent where-in the non-modified flows remain intact and all >> the sessions related to those flows will not face any downtime? >> >> I am not sure about the "replace-flows". However, I think these are > independent optimizations. I think postponing the clearing would solve the > major part of the problem. I believe currently > 90% of the time is spent > on waiting for computing to finish while the OVS flows are already cleared, > instead of on the one time flow installation. But yes, that could be a > further optimization. >
Agree. >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss