On Sat, Aug 8, 2020 at 12:16 AM Han Zhou <zhou...@gmail.com> wrote: > > > On Thu, Aug 6, 2020 at 10:22 AM Han Zhou <zhou...@gmail.com> wrote: > >> >> >> On Thu, Aug 6, 2020 at 9:15 AM Numan Siddique <num...@ovn.org> wrote: >> >>> >>> >>> On Thu, Aug 6, 2020 at 9:25 PM Venugopal Iyer <venugop...@nvidia.com> >>> wrote: >>> >>>> Hi, Han: >>>> >>>> >>>> >>>> A comment inline: >>>> >>>> >>>> >>>> *From:* ovn-kuberne...@googlegroups.com < >>>> ovn-kuberne...@googlegroups.com> *On Behalf Of *Han Zhou >>>> *Sent:* Wednesday, August 5, 2020 3:36 PM >>>> *To:* Winson Wang <windson.w...@gmail.com> >>>> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com; >>>> Dumitru Ceara <dce...@redhat.com>; Han Zhou <hz...@ovn.org> >>>> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process >>>> keep the previous Open Flow in br-int >>>> >>>> >>>> >>>> *External email: Use caution opening links or attachments* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windson.w...@gmail.com> >>>> wrote: >>>> >>>> Hello OVN Experts, >>>> >>>> >>>> With ovn-k8s, we need to keep the flows always on br-int which needed >>>> by running pods on the k8s node. >>>> >>>> Is there an ongoing project to address this problem? >>>> >>>> If not, I have one proposal not sure if it is doable. >>>> >>>> Please share your thoughts. >>>> The issue: >>>> >>>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on >>>> every K8s node. When we restart ovn-controller for upgrade using >>>> `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still >>>> works fine since br-int with flows still be Installed. >>>> >>>> >>>> >>>> However, when a new ovn-controller starts it will connect OVS IDL and >>>> do an engine init run, clearing all OpenFlow flows and install flows based >>>> on SB DB. >>>> >>>> With open flows count above 200K+, it took more than 15 seconds to get >>>> all the flows installed br-int bridge again. >>>> >>>> >>>> Proposal solution for the issue: >>>> >>>> When the ovn-controller gets “exit --start”, it will write a >>>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in >>>> external-ids column. When new ovn-controller starts, it will check if the >>>> “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from >>>> OVS IDL to decide if it will force a recomputing process? >>>> >>>> >>>> >>>> >>>> >>>> Hi Winson, >>>> >>>> >>>> >>>> Thanks for the proposal. Yes, the connection break during upgrading is >>>> a real issue in a large scale environment. However, the proposal doesn't >>>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB, >>>> which is a completely different connection from the ovs-vswitchd open-flow >>>> connection. >>>> >>>> To avoid clearing the open-flow table during ovn-controller startup, we >>>> can find a way to postpone clearing the OVS flows after the recomputing in >>>> ovn-controller is completed, right before ovn-controller replacing with the >>>> new flows. >>>> >>>> *[vi> ] * >>>> >>>> *[vi> ] Seems like we force recompute today if the OVS IDL is >>>> reconnected. Would it be possible to defer * >>>> >>>> *decision to recompute the flows based on the SB’s nb_cfg we have >>>> sync’d with? i.e. If our nb_cfg is * >>>> >>>> *in sync with the SB’s global nb_cfg, we can skip the recompute? At >>>> least if nothing has changed since* >>>> >>>> *the restart, we won’t need to do anything.. We could stash nb_cfg in >>>> OVS (once ovn-controller receives* >>>> >>>> *conformation from OVS that the physical flows for an nb_cfg update are >>>> in place), which should be cleared if * >>>> >>>> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check >>>> if NB, SB and Chassis are in sync, we * >>>> >>>> *could extend this to OVS/physical flows?)* >>>> >>>> >>>> >>>> *Have not thought through this though .. so maybe I am missing >>>> something…* >>>> >>>> >>>> >>>> *Thanks,* >>>> >>>> >>>> >>>> *-venu* >>>> >>>> This should largely reduce the time of connection broken during >>>> upgrading. Some changes in the ofctrl module's state machine are required, >>>> but I am not 100% sure if this approach is applicable. Need to check more >>>> details. >>>> >>> >>> >>> We can also think if its possible to do the below way >>> - When ovn-controller starts, it will not clear the flows, but >>> instead will get the dump of flows from the br-int and populate these >>> flows in its installed flows >>> - And then when it connects to the SB DB and computes the desired >>> flows, it will anyway sync up with the installed flows with the desired >>> flows >>> - And if there is no difference between desired flows and installed >>> flows, there will be no impact on the datapath at all. >>> >>> Although this would require a careful thought and proper handling. >>> >> >> Numan, as I responded to Girish, this avoids the time spent on the >> one-time flow installation after restart (the < 10% part of the connection >> broken time), but I think currently the major problem is that > 90% of the >> time is spent on waiting for computing to finish while the OVS flows are >> already cleared. It is surely an optimization, but the most important one >> now is to avoid the 90% time. I will look at postpone clearing flows first. >> >> > > I thought about this again. It seems more complicated than it appeared and > let me summarize here: > > The connection break time during the upgrading consists two parts: > 1) The time gap between flow clearing and the start of the flow > installation for the fully computed flows, i.e. waiting for flow > installation. > 2) The time spent during flow installation, which takes several rounds of > ovn-controller main loop iteration. (I take back my earlier statement that > this contributes only 10% of the total time. According to the log shared by > Girish, it seems at least more than 50% of the time is spent here). > > For 1), postponing clearing flows is the solution, but it is not as easy > as I thought, because there is no easy way to determine if ovn-controller > has completed the initial computing. > When ovn-controller starts, it initializes the IDL connections with SB and > local OVSDB, and sends the initial monitor conditions to SB DB. It may take > several rounds of receiving SB notifications, update monitor conditions, > and computing to generate all flows required. If we replace the flows to > OVS before it is fully complete, it would end up with the same problem. I > can't think of an ideal and clean approach to solve the problem. However, a > "not so good" solution could be, support an option for ovn-controller > command to delay the clearing of OVS flows. It is then the operator's job > to figure out the best time to delay, according to the scale of their > environment, to reduce the time gap on waiting for the new flow > installation. This is not an ideal approach, but I think it should be > helpful for large scale environment upgrading in practise. Thoughts? > > For 2), Numan's suggestion of syncing back OVS flows before flow > installation and installing only the delta (without clearing the flows) > seems to be perfect solution. However, there are some tricky parts that > need to be considered: > 1. Apart from OVS flows, meter and group table also need to be restored > 2. The installed flows in ovn-controller require some other metadata that > is not available from OVS, such as sb_uuid. > 3. The syncing itself may take significant extra cost and further delays > the initialization. > > Alternatively, for 2), I think probably we can utilize the "bundle" > operation of OpenFlow to replace the flows in OVS atomically (on > ovs-vswitchd side) which should avoid the long connection break. I am not > sure which one is more applicable yet. > > I'd also like to emphasize that even though the solution for 2) doesn't > clear flows, it doesn't avoid problem 1) automatically, because we will > still need to figure out when the major flow compute is complete and ready > to be installed/synced to OVS. Otherwise, we could replace the old huge > flow tables with a small number of incompleted flows, which still results > in the same connection break. >
I have another suggestion to handle this issue during upgrade. Let's say the br-int has ports p1, p2, ....,p10 which corresponds to the logical ports ... p1, p2, ... , p10, Then the following can be done 1. Create a temporary bridge - br-temp ovs-vsctl add-br br-temp 2. Create the ports p1, p2... to p10 in br-temp with different names but external_ids:iface-id set properly. Eg. ovs-vsctl add-port br-temp temp-p1 -- set interface temp-p1 type=internal -- set interface temp-p1 external_ids:iface-id=p1 .. .. ovs-vsctl add-port br-temp temp-p10 -- set interface temp-p10 type=internal -- set interface temp-p10 external_ids:iface-id=p1 (I think this can be easily scripted) 3. Just before restart of ovn-controller run - ovs-vsctl set open . external_ids:ovn-bridge=br-temp 4. Restart ovn-controller after upgrading 5. Wait till ovn-controller connects to the SB ovsdb-server and all the flows appear in br-temp 6. Switch back to the ovn bridge to br-int - ovs-vsctl set open . external_ids:ovn-bridge=br-int 7. Delete br-temp - ovs-vsctl del-br br-temp Till step 5, there should not be any datapath impact as br-int is untouched and all the flows would be there. There could be some downtime after step 6 as ovn-controller may delete all the flows in br-int and re add again. But the duration should be shorter. Please note I have not tested this myself. But its worth testing this in a small environment before trying on an actual deployment. You could skip step 2, but if ovn-monitor-all is false, then you would still see some delay due to conditional monitoring. This is totally under the operator/admin control. And there is no need for any ovn-controller changes. We can still work on approach (2) and handle all the tricky parts mentioned by Han, but this may take time. Any thoughts about this ? We used this similar approach when I worked on a migration script to migrate an existing OpenStack deployment with ML2OVS to ML2OVN. Thanks Numan > Thanks, > Han > > >>> Thanks >>> Numan >>> >>> >>>> >>>> Thanks, >>>> >>>> Han >>>> >>>> Test log: >>>> >>>> Check flow cnt on br-int every second: >>>> >>>> >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=0 >>>> >>>> packet_count=0 byte_count=0 flow_count=10322 >>>> >>>> packet_count=0 byte_count=0 flow_count=34220 >>>> >>>> packet_count=0 byte_count=0 flow_count=60425 >>>> >>>> packet_count=0 byte_count=0 flow_count=82506 >>>> >>>> packet_count=0 byte_count=0 flow_count=106771 >>>> >>>> packet_count=0 byte_count=0 flow_count=131648 >>>> >>>> packet_count=2 byte_count=120 flow_count=158303 >>>> >>>> packet_count=29 byte_count=1693 flow_count=185999 >>>> >>>> packet_count=188 byte_count=12455 flow_count=212764 >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Winson >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ovn-kubernetes" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ovn-kubernetes" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCn5wEGZZ4%3DdovxhQZ2cgWpEyiPhbChk9amodnxNVgeQxQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCn5wEGZZ4%3DdovxhQZ2cgWpEyiPhbChk9amodnxNVgeQxQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ovn-kubernetes" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB33495002B3970889CA9293B3BC480%40BYAPR12MB3349.namprd12.prod.outlook.com >>>> <https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB33495002B3970889CA9293B3BC480%40BYAPR12MB3349.namprd12.prod.outlook.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "ovn-kubernetes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to ovn-kubernetes+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3DBFTxGtnJ-3J5xSFMr0zW%2Boc%3D74Kk2iX4ffNU56TdauA%40mail.gmail.com > <https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3DBFTxGtnJ-3J5xSFMr0zW%2Boc%3D74Kk2iX4ffNU56TdauA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss