On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson <mmich...@redhat.com> wrote: > > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote: > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson <mmich...@redhat.com> wrote: > >> > >> On 8/29/19 2:39 PM, Numan Siddique wrote: > >>> Hello Everyone, > >>> > >>> In one of the OVN deployments, we are seeing 100% CPU usage by > >>> ovn-controllers all the time. > >>> > >>> After investigations we found the below > >>> > >>> - ovn-controller is taking more than 20 seconds to complete full loop > >>> (mainly in lflow_run() function) > >>> > >>> - The physical switch is sending GARPs periodically every 10 seconds. > >>> > >>> - There is ovn-bridge-mappings configured and these GARP packets > >>> reaches br-int via the patch port. > >>> > >>> - We have a flow in router pipeline which applies the action - put_arp > >>> if it is arp packet. > >>> > >>> - ovn-controller pinctrl thread receives these garps, stores the > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the > >>> ovn-controller main thread by incrementing the seq no. > >>> > >>> - In the ovn-controller main thread, after lflow_run() finishes, > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as > >>> 'put_mac_bindings' hmap is not empty. > >>> > >>> - This causes the ovn-controller poll_block() to not sleep at all and > >>> this repeats all the time resulting in 100% cpu usage. > >>> > >>> The deployment has OVS/OVN 2.9. We have back ported the pinctrl_thread > >>> patch. > >>> > >>> Some time back I had reported an issue about lflow_run() taking lot of > >>> time - https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html > >>> > >>> I think we need to improve the logical processing sooner or later. > >> > >> I agree that this is very important. I know that logical flow processing > >> is the biggest bottleneck for ovn-controller, but 20 seconds is just > >> ridiculous. In your scale testing, you found that lflow_run() was taking > >> 10 seconds to complete. > > I support this statement 100% (20 seconds is just ridiculous). To be > > precise, in this deployment we see over 23 seconds for the main loop > > to process and I've seen even 30 seconds some times. I've been talking > > to Numan these days about this issue and I support profiling this > > actual deployment so that we can figure out how incremental processing > > would help. > > > >> > >> I'm curious if there are any factors in this particular deployment's > >> configuration that might contribute to this. For instance, does this > >> deployment have a glut of ACLs? Are they not using port groups? > > They're not using port groups because it's 2.9 and it is not there. > > However, I don't think port groups would make a big difference in > > terms of ovn-controller computation. I might be wrong but Port Groups > > help reduce the number of ACLs in the NB database while the # of > > Logical Flows would still remain the same. We'll try to get the > > contents of the NB database and figure out what's killing it. > > > > You're right that port groups won't reduce the number of logical flows.
I think port-group reduces number of logical flows significantly, and also reduces OVS flows when conjunctive matches are effective. Please see my calculation here: https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30 > However, it can reduce the computation in ovn-controller. The reason is > that the logical flows generated by ACLs that use port groups may result > in conjunctive matches being used. If you want a bit more information, > see the "Port groups" section of this blog post I wrote: > > https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/ > > The TL;DR is that with port groups, I saw the number of OpenFlow flows > generated by ovn-controller drop by 3 orders of magnitude. And that > meant that flow processing was 99% faster for large networks. > > You may not see the same sort of improvement for this deployment, mainly > because my test case was tailored to illustrate how port groups help. > There may be other factors in this deployment that complicate flow > processing. > > >> > >> This particular deployment's configuration may give us a good scenario > >> for our testing to improve lflow processing time. > > Absolutely! > >> > >>> > >>> But to fix this issue urgently, we are thinking of the below approach. > >>> > >>> - pinctrl_thread will locally cache the mac_binding entries (just like > >>> it caches the dns entries). (Please note pinctrl_thread can not access > >>> the SB DB IDL). > >> > >>> > >>> - Upon receiving any arp packet (via the put_arp action), pinctrl_thread > >>> will check the local mac_binding cache and will only wake up the main > >>> ovn-controller thread only if the mac_binding update is required. > >>> > >>> This approach will solve the issue since the MAC sent by the physical > >>> switches will not change. So there is no need to wake up ovn-controller > >>> main thread. > >> > >> I think this can work well. We have a lot of what's needed already in > >> pinctrl at this point. We have the hash table of mac bindings already. > >> Currently, we flush this table after we write the data to the southbound > >> database. Instead, we would keep the bindings in memory. We would need > >> to ensure that the in-memory MAC bindings eventually get deleted if they > >> become stale. > >> > >>> > >>> In the present master/2.12 these GARPs will not cause this 100% cpu loop > >>> issue because incremental processing will not recompute flows. > >> > >> Another mitigating factor for master is something I'm currently working > >> on. I've got the beginnings of a patch series going where I am > >> separating pinctrl into a separate process from ovn-controller: > >> https://github.com/putnopvut/ovn/tree/pinctrl_process > >> > >> It's in the early stages right now, so please don't judge :) > >> > >> Separating pinctrl to its own process means that it cannot directly > >> cause ovn-controller to wake up like it currently might. > >> > >>> > >>> Even though the above approach is not really required for master/2.12, I > >>> think it is still Ok to have this as there is no harm. > >>> > >>> I would like to know your comments and any concerns if any. > >> > >> Hm, I don't really understand why we'd want to put this in master/2.12 > >> if the problem doesn't exist there. The main concern I have is with > >> regards to cache lifetime. I don't want to introduce potential memory > >> growth concerns into a branch if it's not necessary. > >> > >> Is there a way for us to get this included in 2.9-2.11 without having to > >> put it in master or 2.12? It's hard to classify this as a bug fix, > >> really, but it does prevent unwanted behavior in real-world setups. > >> Could we get an opinion from committers on this? > >> > >>> > >>> Thanks > >>> Numan > >>> > >>> > >>> _______________________________________________ > >>> discuss mailing list > >>> disc...@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >>> > >> > >> _______________________________________________ > >> discuss mailing list > >> disc...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss