On Fri, Aug 30, 2019 at 12:37 AM Han Zhou <zhou...@gmail.com> wrote: > > > On Thu, Aug 29, 2019 at 11:40 AM Numan Siddique <nusid...@redhat.com> > wrote: > > > > Hello Everyone, > > > > In one of the OVN deployments, we are seeing 100% CPU usage by > ovn-controllers all the time. > > > > After investigations we found the below > > > > - ovn-controller is taking more than 20 seconds to complete full loop > (mainly in lflow_run() function) > > > > - The physical switch is sending GARPs periodically every 10 seconds. > > > > - There is ovn-bridge-mappings configured and these GARP packets > reaches br-int via the patch port. > > > > - We have a flow in router pipeline which applies the action - put_arp > > if it is arp packet. > > > > - ovn-controller pinctrl thread receives these garps, stores the learnt > mac-ips in the 'put_mac_bindings' hmap and notifies the ovn-controller main > thread by incrementing the seq no. > > > > - In the ovn-controller main thread, after lflow_run() finishes, > pinctrl_wait() is called. This function calls - poll_immediate_wake() as > 'put_mac_bindings' hmap is not empty. > > > > - This causes the ovn-controller poll_block() to not sleep at all and > this repeats all the time resulting in 100% cpu usage. > > > > The deployment has OVS/OVN 2.9. We have back ported the pinctrl_thread > patch. > > > > Some time back I had reported an issue about lflow_run() taking lot of > time - > https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html > > > > I think we need to improve the logical processing sooner or later. > > > > But to fix this issue urgently, we are thinking of the below approach. > > > > - pinctrl_thread will locally cache the mac_binding entries (just like > it caches the dns entries). (Please note pinctrl_thread can not access the > SB DB IDL). > > > > - Upon receiving any arp packet (via the put_arp action), pinctrl_thread > will check the local mac_binding cache and will only wake up the main > ovn-controller thread only if the mac_binding update is required. > > > > This approach will solve the issue since the MAC sent by the physical > switches will not change. So there is no need to wake up ovn-controller > main thread. > > > > In the present master/2.12 these GARPs will not cause this 100% cpu loop > issue because incremental processing will not recompute flows. > > > > Even though the above approach is not really required for master/2.12, I > think it is still Ok to have this as there is no harm. > > > > I would like to know your comments and any concerns if any. > > > > Thanks > > Numan > > > > Hi Numan, > > I think this approach should work. Just to make sure, to update the cache > efficiently (to avoid another kind of recompute), it should use ovsdb > change-tracking to update it incrementally. > > Regarding master/2.12, it is not harmful except that it will add some more > code and increase memory footprint. For our current use cases, there can be > easily 10,000s mac_bindings, but it may still be ok because each entry is > very small. However, is there any benefit for doing this in master/2.12? >
I don't see much benefit. But I can't submit a patch to branch 2.9 without the fix getting merged in master first right ? May be once it is merged in branch 2.9, we can consider to delete it ? Thanks Numan > > Thanks, > Han > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss