On Sat, Aug 8, 2020 at 12:16 AM Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Thu, Aug 6, 2020 at 10:22 AM Han Zhou <zhou...@gmail.com> wrote:
>
>>
>>
>> On Thu, Aug 6, 2020 at 9:15 AM Numan Siddique <num...@ovn.org> wrote:
>>
>>>
>>>
>>> On Thu, Aug 6, 2020 at 9:25 PM Venugopal Iyer <venugop...@nvidia.com>
>>> wrote:
>>>
>>>> Hi, Han:
>>>>
>>>>
>>>>
>>>> A comment inline:
>>>>
>>>>
>>>>
>>>> *From:* ovn-kuberne...@googlegroups.com <
>>>> ovn-kuberne...@googlegroups.com> *On Behalf Of *Han Zhou
>>>> *Sent:* Wednesday, August 5, 2020 3:36 PM
>>>> *To:* Winson Wang <windson.w...@gmail.com>
>>>> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com;
>>>> Dumitru Ceara <dce...@redhat.com>; Han Zhou <hz...@ovn.org>
>>>> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process
>>>> keep the previous Open Flow in br-int
>>>>
>>>>
>>>>
>>>> *External email: Use caution opening links or attachments*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windson.w...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello OVN Experts,
>>>>
>>>>
>>>> With ovn-k8s,  we need to keep the flows always on br-int which needed
>>>> by running pods on the k8s node.
>>>>
>>>> Is there an ongoing project to address this problem?
>>>>
>>>> If not,  I have one proposal not sure if it is doable.
>>>>
>>>> Please share your thoughts.
>>>> The issue:
>>>>
>>>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
>>>> every K8s node.  When we restart ovn-controller for upgrade using
>>>> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
>>>> works fine since br-int with flows still be Installed.
>>>>
>>>>
>>>>
>>>> However, when a new ovn-controller starts it will connect OVS IDL and
>>>> do an engine init run,  clearing all OpenFlow flows and install flows based
>>>> on SB DB.
>>>>
>>>> With open flows count above 200K+,  it took more than 15 seconds to get
>>>> all the flows installed br-int bridge again.
>>>>
>>>>
>>>> Proposal solution for the issue:
>>>>
>>>> When the ovn-controller gets “exit --start”,  it will write a
>>>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
>>>> external-ids column. When new ovn-controller starts, it will check if the
>>>> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
>>>> OVS IDL to decide if it will force a recomputing process?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi Winson,
>>>>
>>>>
>>>>
>>>> Thanks for the proposal. Yes, the connection break during upgrading is
>>>> a real issue in a large scale environment. However, the proposal doesn't
>>>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
>>>> which is a completely different connection from the ovs-vswitchd open-flow
>>>> connection.
>>>>
>>>> To avoid clearing the open-flow table during ovn-controller startup, we
>>>> can find a way to postpone clearing the OVS flows after the recomputing in
>>>> ovn-controller is completed, right before ovn-controller replacing with the
>>>> new flows.
>>>>
>>>> *[vi> ] *
>>>>
>>>> *[vi> ] Seems like we force recompute today if the OVS IDL is
>>>> reconnected. Would it be possible to defer *
>>>>
>>>> *decision to  recompute the flows based on  the  SB’s nb_cfg we have
>>>>  sync’d with? i.e.  If  our nb_cfg is *
>>>>
>>>> *in sync with the SB’s global nb_cfg, we can skip the recompute?  At
>>>> least if nothing has changed since*
>>>>
>>>> *the restart, we won’t need to do anything.. We could stash nb_cfg in
>>>> OVS (once ovn-controller receives*
>>>>
>>>> *conformation from OVS that the physical flows for an nb_cfg update are
>>>> in place), which should be cleared if *
>>>>
>>>> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check
>>>> if NB, SB and Chassis are in sync, we *
>>>>
>>>> *could extend this to OVS/physical flows?)*
>>>>
>>>>
>>>>
>>>> *Have not thought through this though .. so maybe I am missing
>>>> something…*
>>>>
>>>>
>>>>
>>>> *Thanks,*
>>>>
>>>>
>>>>
>>>> *-venu*
>>>>
>>>> This should largely reduce the time of connection broken during
>>>> upgrading. Some changes in the ofctrl module's state machine are required,
>>>> but I am not 100% sure if this approach is applicable. Need to check more
>>>> details.
>>>>
>>>
>>>
>>> We can also think if its possible to do the below way
>>>    - When ovn-controller starts, it will not clear the flows, but
>>> instead will get the dump of flows  from the br-int and populate these
>>> flows in its installed flows
>>>     - And then when it connects to the SB DB and computes the desired
>>> flows, it will anyway sync up with the installed flows with the desired
>>> flows
>>>     - And if there is no difference between desired flows and installed
>>> flows, there will be no impact on the datapath at all.
>>>
>>> Although this would require a careful thought and proper handling.
>>>
>>
>> Numan, as I responded to Girish, this avoids the time spent on the
>> one-time flow installation after restart (the < 10% part of the connection
>> broken time), but I think currently the major problem is that > 90% of the
>> time is spent on waiting for computing to finish while the OVS flows are
>> already cleared. It is surely an optimization, but the most important one
>> now is to avoid the 90% time. I will look at postpone clearing flows first.
>>
>>
>
> I thought about this again. It seems more complicated than it appeared and
> let me summarize here:
>
> The connection break time during the upgrading consists two parts:
> 1) The time gap between flow clearing and the start of the flow
> installation for the fully computed flows, i.e. waiting for flow
> installation.
> 2) The time spent during flow installation, which takes several rounds of
> ovn-controller main loop iteration. (I take back my earlier statement that
> this contributes only 10% of the total time. According to the log shared by
> Girish, it seems at least more than 50% of the time is spent here).
>
> For 1), postponing clearing flows is the solution, but it is not as easy
> as I thought, because there is no easy way to determine if ovn-controller
> has completed the initial computing.
> When ovn-controller starts, it initializes the IDL connections with SB and
> local OVSDB, and sends the initial monitor conditions to SB DB. It may take
> several rounds of receiving SB notifications, update monitor conditions,
> and computing to generate all flows required. If we replace the flows to
> OVS before it is fully complete, it would end up with the same problem. I
> can't think of an ideal and clean approach to solve the problem. However, a
> "not so good" solution could be, support an option for ovn-controller
> command to delay the clearing of OVS flows. It is then the operator's job
> to figure out the best time to delay, according to the scale of their
> environment, to reduce the time gap on waiting for the new flow
> installation. This is not an ideal approach, but I think it should be
> helpful for large scale environment upgrading in practise. Thoughts?
>
> For 2), Numan's suggestion of syncing back OVS flows before flow
> installation and installing only the delta (without clearing the flows)
> seems to be perfect solution. However, there are some tricky parts that
> need to be considered:
> 1. Apart from OVS flows, meter and group table also need to be restored
> 2. The installed flows in ovn-controller require some other metadata that
> is not available from OVS, such as sb_uuid.
> 3. The syncing itself may take significant extra cost and further delays
> the initialization.
>
> Alternatively, for 2), I think probably we can utilize the "bundle"
> operation of OpenFlow to replace the flows in OVS atomically (on
> ovs-vswitchd side) which should avoid the long connection break. I am not
> sure which one is more applicable yet.
>
> I'd also like to emphasize that even though the solution for 2) doesn't
> clear flows, it doesn't avoid problem 1) automatically, because we will
> still need to figure out when the major flow compute is complete and ready
> to be installed/synced to OVS. Otherwise, we could replace the old huge
> flow tables with a small number of incompleted flows, which still results
> in the same connection break.
>


I have another suggestion to handle this issue during upgrade.

Let's say the br-int has ports p1, p2, ....,p10 which corresponds to the
logical ports ... p1, p2, ... , p10,

Then the following can be done

1. Create a temporary bridge - br-temp
    ovs-vsctl add-br br-temp

2. Create the ports p1, p2... to p10 in br-temp with different names but
external_ids:iface-id set properly.
   Eg.
   ovs-vsctl add-port br-temp temp-p1 -- set interface temp-p1
type=internal -- set interface temp-p1 external_ids:iface-id=p1
   ..
   ..
   ovs-vsctl add-port br-temp temp-p10 -- set interface temp-p10
type=internal -- set  interface temp-p10 external_ids:iface-id=p1

    (I think this can be easily scripted)

3. Just before restart of ovn-controller run -   ovs-vsctl set open .
external_ids:ovn-bridge=br-temp

4. Restart ovn-controller after upgrading

5. Wait till ovn-controller connects to the SB ovsdb-server and all the
flows appear in br-temp

6. Switch back to the ovn bridge to br-int - ovs-vsctl set open .
external_ids:ovn-bridge=br-int

7. Delete br-temp - ovs-vsctl del-br br-temp

Till step 5, there should not be any datapath impact as br-int is untouched
and all the flows would be there.

There could be some downtime after step 6 as ovn-controller may delete all
the flows in br-int and re add again. But the
duration should be shorter.

Please note I have not tested this myself. But its worth testing this in a
small environment before trying on an actual deployment.

You could skip step 2, but if ovn-monitor-all is false, then you would
still see some delay due to conditional monitoring.

This is totally under the operator/admin control. And there is no need for
any ovn-controller changes. We can still work on approach (2)
and handle all the tricky parts mentioned by Han, but this may take time.

Any thoughts about this ? We used this similar approach when I worked on a
migration script to migrate an existing OpenStack deployment with ML2OVS to
ML2OVN.


Thanks
Numan


> Thanks,
> Han
>
>
>>> Thanks
>>> Numan
>>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Han
>>>>
>>>> Test log:
>>>>
>>>> Check flow cnt on br-int every second:
>>>>
>>>>
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=0
>>>>
>>>> packet_count=0 byte_count=0 flow_count=10322
>>>>
>>>> packet_count=0 byte_count=0 flow_count=34220
>>>>
>>>> packet_count=0 byte_count=0 flow_count=60425
>>>>
>>>> packet_count=0 byte_count=0 flow_count=82506
>>>>
>>>> packet_count=0 byte_count=0 flow_count=106771
>>>>
>>>> packet_count=0 byte_count=0 flow_count=131648
>>>>
>>>> packet_count=2 byte_count=120 flow_count=158303
>>>>
>>>> packet_count=29 byte_count=1693 flow_count=185999
>>>>
>>>> packet_count=188 byte_count=12455 flow_count=212764
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Winson
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "ovn-kubernetes" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "ovn-kubernetes" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCn5wEGZZ4%3DdovxhQZ2cgWpEyiPhbChk9amodnxNVgeQxQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCn5wEGZZ4%3DdovxhQZ2cgWpEyiPhbChk9amodnxNVgeQxQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "ovn-kubernetes" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB33495002B3970889CA9293B3BC480%40BYAPR12MB3349.namprd12.prod.outlook.com
>>>> <https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB33495002B3970889CA9293B3BC480%40BYAPR12MB3349.namprd12.prod.outlook.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3DBFTxGtnJ-3J5xSFMr0zW%2Boc%3D74Kk2iX4ffNU56TdauA%40mail.gmail.com
> <https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3DBFTxGtnJ-3J5xSFMr0zW%2Boc%3D74Kk2iX4ffNU56TdauA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to