On Wed, May 9, 2018 at 12:21 PM, Mark Michelson <mmich...@redhat.com> wrote:
> On 05/07/2018 08:29 AM, Mark Michelson wrote: > >> On 05/04/2018 04:15 PM, Han Zhou wrote: >> >>> >>> >>> On Thu, May 3, 2018 at 10:49 AM, Mark Michelson <mmich...@redhat.com >>> <mailto:mmich...@redhat.com>> wrote: >>> >>> On 05/03/2018 12:58 PM, Mark Michelson wrote: >>> >>> Hi Han, >>> >>> (cc'ed ovs-discuss) >>> >>> I have some test results here for your incremental branch[0]. >>> >>> Browbeat was the test orchestrator for the test, and it uses >>> ovn-scale-test[1] to configure the test parameters and run the >>> test. >>> >>> The test use one central node on which ovn-northd runs. There >>> are three farm nodes on which sandboxes are run for >>> ovn-controller. Each farm node runs 24 sandboxes, for a total of >>> 72 sandboxes (and thus 72 ovn-controller processes). >>> >>> The test uses the datapath context[2] to set up 72 logical >>> switches and one logical router in advance. Then during each >>> test iteration, a logical switch port is added to one of the >>> logical switches and bound on one of the sandboxes. The next >>> iteration does not start until the previous iteration is 100% >>> complete (i.e. we see that the logical switch port is "up" in >>> the northbound db). The total number of logical switch ports >>> added during the tests is 3312. >>> >>> During the test, I ran `perf record` on one of the >>> ovn-controller processes and then created a flame graph[3] from >>> the results. I have attached the flame graph to this e-mail. I >>> think this can give us a good jumping off point for determining >>> more optimizations to make to ovn-controller. >>> >>> [0] https://github.com/hzhou8/ovs/tree/ip7 >>> <https://github.com/hzhou8/ovs/tree/ip7> >>> [1] https://github.com/openvswitch/ovn-scale-test >>> <https://github.com/openvswitch/ovn-scale-test> >>> [2] https://github.com/openvswitch/ovn-scale-test/pull/165 >>> <https://github.com/openvswitch/ovn-scale-test/pull/165> >>> [3] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs >>> <http://www.brendangregg.com/FlameGraphs/cpuflamegraphs> >>> >>> >>> >>> From the IRC meeting, it was requested to see a flame graph of >>> performance on OVS master. I am attaching that on this e-mail. >>> >>> One difference in this test run is that the number of switch ports >>> was fewer (I'm not 100% sure of the exact number), so the number of >>> samples in perf record is less than in the flame graph I previously >>> sent. >>> >>> The vast majority of the time is spent in lflow_run(). Based on this >>> flame graph, our initial take on the matter was that we could get >>> improved performance by reducing the number of logical flows to >>> process. The incremental branch seemed like a good testing target to >>> that end. >>> >>> >>> Thanks Mark for sharing the results! >>> It seems you have sent the wrong attachment perf-master.svg in your >>> second email, which is still the same one as in the first email. Would you >>> mind sending the right one? Also, please if you could share total CPU cost >>> for incremental v.s. master, when you have the data. >>> >>> From your text description, it is improved as expected, since the >>> bottleneck moved from lflow_run() to ofctrl_put(). For the new bottleneck >>> ofctrl_put(), it's a good finding, and I think I have some ideas to improve >>> that, too. Basically, when we are able to do incremental computing, we >>> don't need to go through and compare the installed v.s. desired flows every >>> time, but instead we can maintain a list of changes made in the iteration >>> and then send them to OVS. This would reduce a lot of ovn_flow_lookup() >>> calls which is currently shown as the hotspot in ofctrl_put(). I will try >>> this as soon as I fix some corner cases and get confident about the >>> correctness. >>> >>> Thanks, >>> Han >>> >> >> Apologies, Han. Here is the proper file. >> >> Your conclusion is similar to what I thought about when I saw where the >> new bottleneck was. >> >> I don't have a comparison of total CPU cost yet. But I can tell you that >> from observing CPU usage in real time during the test with the master >> branch, the graphs appeared to be at 100% across all CPU cores for the >> duration of the test. I have not observed CPU usage during a test with the >> incremental branch to see if it improved. >> >> Mark! >> > > Hello Han, > > Just following up on the previous e-mail. I plan to run a test soon with > the incremental branch and observe CPU usage. I won't be able to share a > graph, but I can at least tell you if I observe an overall drop in CPU > usage compared to using the master branch. > > I took a look through ofctrl_put() and I think that the incremental > processing branch puts it in a good position to be improved dramatically. > > In master, we currently maintain a hmap of installed flows. When a loop of > ovn-controller runs, we use the contents of the southbound database to > build up a hmap of our desired flows. We then perform an (expensive) > operation where we reconcile the installed flows with the desired flows. > The output of all of this is a list of ofputil_flow_mods that we then pass > down to vswitchd. > > In the incremental processing branch [1], this is helped a bit. When we > can perform incremental processing, the desired flows are no longer built > from scratch on each loop. Instead, the desired flows persist and are > modified based on changes observed in the database. ofctrl_put() is mostly > unchanged, although it can exit early if it knows that there were no > changes observed. > In fact the ofctrl_put() (in ip7 branch) already takes care of the no change scenario. It returns immediately if decided no change needed after some simple evaluation ("flow_changed" is one of the condition it considers). But yes, when change is needed, it can be further optimized as I mentioned also in last email. > > I've thought about this and considered two stages to potentially improving > this. > > Stage 1: When performing incremental processing, don't bother dealing with > desired flows at all. Instead, when changes are observed in the southbound > database, modify the installed flows directly and create ofputil_flow_mods > based on the changes. When it is time to run ofctrl_put(), we can pass the > generated ofputil_flow_mods directly to vswitchd without having to perform > any sort of reconciliation between hmaps. We still have to maintain the > installed flows because full runs through the southbound database will need > to perform the "classic" version of ofctrl_put() and will need to have an > up-to-date table of installed flows. > > I was thinking about the same, but then figured out we still need the desired flows because of a corner case: different logical flows can generate same OVS flows (e.g. when 2 identical ACLs are created for the same ovn port, due to client behavior, which is real at least in Neutron), but duplicated flows can't exist in installed flows. If we don't maintain desired flows that stores multiple entries of duplicated flows, when we handle a logical flow deletion we don't know if the corresponding installed flow(s) is/are still being used by any other logical flow. This can be addressed by changing the installed flow table structure to maintain the M:N mapping between logical flow and OVS flows, and then the desired flow table may be removed. However, keeping the desired flow table doesn't harm either (despite the memory consumption), and I am not sure if there are other reasons we will need it (maybe it helps when full computation is needed). When incremental processing taking effect, we can still avoid the costly full table comparing by maintaining a list of the changes while keep updating desired flow table. > Stage 2: If the code ever reaches a point where incremental processing is > guaranteed to be performed on every loop iteration of ovn-controller, then > ovn-controller could become stateless with regards to installed flows. > There would be no need to maintain the installed flows since we could just > generate ofputil_flow_mods based on how the southbound database has > changed. We could then pass those modifications down to vswitchd. > > This would be ideal situation, but it seems to me not very realistic for ovn-controller, since there are just too many dependencies. Probably I am just not convinced of the value v.s. effort to achieve that. Maybe it is ok to recompute for changes that not happen very often, isn't it? > I think similar changes could be implemented for the group and meter > extend tables, but they are not nearly as important as the flow table. > > Do my suggestions make sense to you? Are they in line with what you had > been thinking about? > > Of course! Thanks for sharing your good thoughts. I am glad this work helps and let's improve it further :) > Thanks, > Mark > > [1] Currently I'm looking at https://github.com/hzhou8/tree/ip7 . I > noticed that there is an ip8 branch as well, but since I have been using > ip7 in tests, I decided look at it as a reference. > I will keep the branch ip7 untouched and let you know when I get ip8 ready. Thanks, Han
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss