On Wed, Aug 5, 2020 at 4:21 PM Girish Moodalbail <gmoodalb...@gmail.com>
wrote:

>
>
> On Wed, Aug 5, 2020 at 3:35 PM Han Zhou <zhou...@gmail.com> wrote:
>
>>
>>
>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windson.w...@gmail.com>
>> wrote:
>>
>>> Hello OVN Experts,
>>>
>>> With ovn-k8s,  we need to keep the flows always on br-int which needed
>>> by running pods on the k8s node.
>>> Is there an ongoing project to address this problem?
>>> If not,  I have one proposal not sure if it is doable.
>>> Please share your thoughts.
>>> The issue:
>>>
>>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
>>> every K8s node.  When we restart ovn-controller for upgrade using
>>> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
>>> works fine since br-int with flows still be Installed.
>>>
>>> However, when a new ovn-controller starts it will connect OVS IDL and do
>>> an engine init run,  clearing all OpenFlow flows and install flows based on
>>> SB DB.
>>>
>>> With open flows count above 200K+,  it took more than 15 seconds to get
>>> all the flows installed br-int bridge again.
>>>
>>> Proposal solution for the issue:
>>>
>>> When the ovn-controller gets “exit --start”,  it will write a
>>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
>>> external-ids column. When new ovn-controller starts, it will check if the
>>> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
>>> OVS IDL to decide if it will force a recomputing process?
>>>
>>>
>> Hi Winson,
>>
>> Thanks for the proposal. Yes, the connection break during upgrading is a
>> real issue in a large scale environment. However, the proposal doesn't
>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
>> which is a completely different connection from the ovs-vswitchd open-flow
>> connection.
>> To avoid clearing the open-flow table during ovn-controller startup, we
>> can find a way to postpone clearing the OVS flows after the recomputing in
>> ovn-controller is completed, right before ovn-controller replacing with the
>> new flows. This should largely reduce the time of connection broken during
>> upgrading. Some changes in the ofctrl module's state machine are required,
>> but I am not 100% sure if this approach is applicable. Need to check more
>> details.
>>
>>
> Thanks Han. Yes, postponing clearing of OpenFlow flows until all of the
> logical flows have been translated to OpenFlows will reduce the connection
> downtime. The question though is that can we use 'replace-flows' or
> 'mod-flows equivalent where-in the non-modified flows remain intact and all
> the sessions related to those flows will not face any downtime?
>
> I am not sure about the "replace-flows". However, I think these are
independent optimizations. I think postponing the clearing would solve the
major part of the problem. I believe currently > 90% of the time is spent
on waiting for computing to finish while the OVS flows are already cleared,
instead of on the one time flow installation. But yes, that could be a
further optimization.


> Regards,
> ~Girish
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to