On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson <mmich...@redhat.com> wrote:
>
> On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson <mmich...@redhat.com>
wrote:
> >>
> >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> >>> Hello Everyone,
> >>>
> >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> >>> ovn-controllers all the time.
> >>>
> >>> After investigations we found the below
> >>>
> >>>    - ovn-controller is taking more than 20 seconds to complete full
loop
> >>> (mainly in lflow_run() function)
> >>>
> >>>    - The physical switch is sending GARPs periodically every 10
seconds.
> >>>
> >>>    - There is ovn-bridge-mappings configured and these GARP packets
> >>> reaches br-int via the patch port.
> >>>
> >>>    - We have a flow in router pipeline which applies the action -
put_arp
> >>> if it is arp packet.
> >>>
> >>>    - ovn-controller pinctrl thread receives these garps, stores the
> >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> >>> ovn-controller main thread by incrementing the seq no.
> >>>
> >>>    - In the ovn-controller main thread, after lflow_run() finishes,
> >>> pinctrl_wait() is called. This function calls - poll_immediate_wake()
as
> >>> 'put_mac_bindings' hmap is not empty.
> >>>
> >>> - This causes the ovn-controller poll_block() to not sleep at all and
> >>> this repeats all the time resulting in 100% cpu usage.
> >>>
> >>> The deployment has OVS/OVN 2.9.  We have back ported the
pinctrl_thread
> >>> patch.
> >>>
> >>> Some time back I had reported an issue about lflow_run() taking lot of
> >>> time -
https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> >>>
> >>> I think we need to improve the logical processing sooner or later.
> >>
> >> I agree that this is very important. I know that logical flow
processing
> >> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> >> ridiculous. In your scale testing, you found that lflow_run() was
taking
> >> 10 seconds to complete.
> > I support this statement 100% (20 seconds is just ridiculous). To be
> > precise, in this deployment we see over 23 seconds for the main loop
> > to process and I've seen even 30 seconds some times. I've been talking
> > to Numan these days about this issue and I support profiling this
> > actual deployment so that we can figure out how incremental processing
> > would help.
> >
> >>
> >> I'm curious if there are any factors in this particular deployment's
> >> configuration that might contribute to this. For instance, does this
> >> deployment have a glut of ACLs? Are they not using port groups?
> > They're not using port groups because it's 2.9 and it is not there.
> > However, I don't think port groups would make a big difference in
> > terms of ovn-controller computation. I might be wrong but Port Groups
> > help reduce the number of ACLs in the NB database while the # of
> > Logical Flows would still remain the same. We'll try to get the
> > contents of the NB database and figure out what's killing it.
> >
>
> You're right that port groups won't reduce the number of logical flows.

I think port-group reduces number of logical flows significantly, and also
reduces OVS flows when conjunctive matches are effective.
Please see my calculation here:
https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30

> However, it can reduce the computation in ovn-controller. The reason is
> that the logical flows generated by ACLs that use port groups may result
> in conjunctive matches being used. If you want a bit more information,
> see the "Port groups" section of this blog post I wrote:
>
>
https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/
>
> The TL;DR is that with port groups, I saw the number of OpenFlow flows
> generated by ovn-controller drop by 3 orders of magnitude. And that
> meant that flow processing was 99% faster for large networks.
>
> You may not see the same sort of improvement for this deployment, mainly
> because my test case was tailored to illustrate how port groups help.
> There may be other factors in this deployment that complicate flow
> processing.
>
> >>
> >> This particular deployment's configuration may give us a good scenario
> >> for our testing to improve lflow processing time.
> > Absolutely!
> >>
> >>>
> >>> But to fix this issue urgently, we are thinking of the below approach.
> >>>
> >>>    - pinctrl_thread will locally cache the mac_binding entries (just
like
> >>> it caches the dns entries). (Please note pinctrl_thread can not access
> >>> the SB DB IDL).
> >>
> >>>
> >>> - Upon receiving any arp packet (via the put_arp action),
pinctrl_thread
> >>> will check the local mac_binding cache and will only wake up the main
> >>> ovn-controller thread only if the mac_binding update is required.
> >>>
> >>> This approach will solve the issue since the MAC sent by the physical
> >>> switches will not change. So there is no need to wake up
ovn-controller
> >>> main thread.
> >>
> >> I think this can work well. We have a lot of what's needed already in
> >> pinctrl at this point. We have the hash table of mac bindings already.
> >> Currently, we flush this table after we write the data to the
southbound
> >> database. Instead, we would keep the bindings in memory. We would need
> >> to ensure that the in-memory MAC bindings eventually get deleted if
they
> >> become stale.
> >>
> >>>
> >>> In the present master/2.12 these GARPs will not cause this 100% cpu
loop
> >>> issue because incremental processing will not recompute flows.
> >>
> >> Another mitigating factor for master is something I'm currently working
> >> on. I've got the beginnings of a patch series going where I am
> >> separating pinctrl into a separate process from ovn-controller:
> >> https://github.com/putnopvut/ovn/tree/pinctrl_process
> >>
> >> It's in the early stages right now, so please don't judge :)
> >>
> >> Separating pinctrl to its own process means that it cannot directly
> >> cause ovn-controller to wake up like it currently might.
> >>
> >>>
> >>> Even though the above approach is not really required for
master/2.12, I
> >>> think it is still Ok to have this as there is no harm.
> >>>
> >>> I would like to know your comments and any concerns if any.
> >>
> >> Hm, I don't really understand why we'd want to put this in master/2.12
> >> if the problem doesn't exist there. The main concern I have is with
> >> regards to cache lifetime. I don't want to introduce potential memory
> >> growth concerns into a branch if it's not necessary.
> >>
> >> Is there a way for us to get this included in 2.9-2.11 without having
to
> >> put it in master or 2.12? It's hard to classify this as a bug fix,
> >> really, but it does prevent unwanted behavior in real-world setups.
> >> Could we get an opinion from committers on this?
> >>
> >>>
> >>> Thanks
> >>> Numan
> >>>
> >>>
> >>> _______________________________________________
> >>> discuss mailing list
> >>> disc...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> disc...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to