On Tue, Jun 5, 2018 at 12:00 PM, Mark Michelson <mmich...@redhat.com> wrote:
> On 06/05/2018 01:02 PM, Mark Michelson wrote: > >> On 06/05/2018 12:40 PM, Han Zhou wrote: >> >>> >>> >>> On Fri, May 18, 2018 at 2:03 PM, Han Zhou <zhou...@gmail.com <mailto: >>> zhou...@gmail.com>> wrote: >>> >>> Hi Mark, >>> >>> Thank you so much for sharing this data. Please see my comments >>> inline. >>> >>> On Fri, May 18, 2018 at 1:31 PM, Mark Michelson <mmich...@redhat.com >>> <mailto:mmich...@redhat.com>> wrote: >>> >>> Hi Han, I finally did some tests and looked at the CPU usage >>> between master and the ip7 branch. >>> >>> On the machines running ovn-controller: >>> Master branch: Climbs to around 100% over the course of 3 >>> minutes, then oscillates close to 100% for about 10 minutes, and >>> then is pegged to 100% for the rest of the test. Total test time >>> was about 23 minutes. >>> ip7 branch: oscillates between 10 and 25% for the first 10 >>> minutes of the test, then hovers around 10% for the rest. Total >>> test time was about 19 minutes. >>> >>> This is aligned with my observation of ~90% improvement on CPU cost. >>> >>> For the throughput/total time, the improvement ratio is different >>> (in my test case the execution time reduced ~50%) but I think it can >>> be explained. The total execution time is not accurately reflecting >>> the efficiency of the processing, because when CPU is 100%, >>> ovn-controller processing will be slowed down which may just end up >>> less iterations during the whole test. I think the stop-watch >>> profiling mechanism you implemented (also rebased into the >>> incremental processing) will be able to tell the truth. The real >>> impact of that is longer latency for handling a change in control >>> plane. So I also use latency to evaluate the improvement. The way I >>> test latency is using ovn-nbctl --wait=hv, with the nb_cfg >>> improvement (https://patchwork.ozlabs.org/patch/899608/ >>> <https://patchwork.ozlabs.org/patch/899608/>). >>> >>> When I switched over to tests that have ACLs: >>> Master branch: Behaves about the same as the master branch when >>> no ACLs are used. Total test time was about 28 minutes >>> ip7 branch: CPU usage hovered around 30% for the entirety of the >>> test, hitting spikes around 50% a couple of times. Total test >>> time was about 25 minutes. >>> >>> Since I had not done it yet, I also ran perf while running the >>> incremental branch with ACLs. I am attaching the flame graph >>> here. The gist is that much like the master branch, the majority >>> of CPU time is spent processing logical flows. >>> >>> Seeing the drop in CPU usage between the master branch and the >>> ip7 branch makes me think it is worth investigating other areas >>> that may be the bottleneck. I monitored memory, disk usage, and >>> network usage on the machines, but I didn't see anything that >>> seemed obvious as being the cause for delay. >>> >>> The CPU drop between master and ip7 when testing with ACLs, for my >>> understanding, most likely because of incremental processing avoids >>> recompute flows when irrelevant input such as pinctrl/ofctrl >>> messages (e.g. probe/echo) comes, while in master any of these >>> inputs would trigger recomputing. >>> >>> CPU-wise, I think the biggest improvements that can be made to >>> the incremental processing branch are: >>> * Adding a change handler for the Address_Set table. >>> * ofctrl_put() improvements we have discussed. >>> >>> I think this will have noticeable improvements in our test >>> times. However, based on how much the CPU usage dropped just >>> from switching to the incremental processing branch, I think >>> there are likely some other bottlenecks in our tests that would >>> be more impactful to remove. We already know that >>> "ovn_network.bind_port" and "ovn_network.wait_port_up" in >>> ovn-scale-test terminology are the operations in our test >>> iterations that take the longest. If we can break those down >>> into smaller pieces, we can potentially zero in on what to >>> target next. >>> >>> >>> I am not sure if there is any other *big* bottlenecks, but >>> address-set/port-group and ofctrl_put() improvement are surely >>> needed :) >>> The latest patch I provided is from my ip9 branch, which is rebased >>> on master this week, with some code refactors. Feel free to try it, >>> but don't expect any performance difference. >>> >>> >>> Hi Mark, >>> >>> Do you still have the same environment to try out the address-set >>> incremental processing patches, to see if it improves the test results for >>> ACLs with per-port address sets updates? >>> The patch is v3: https://patchwork.ozlabs.org/p >>> roject/openvswitch/list/?series=48060 >>> It is also in branch ip11. >>> >>> Thanks, >>> Han >>> >> >> As a matter of fact, I saw the ip11 branch this past Friday and gave it a >> test during the weekend. I didn't run perf during the test, but based >> solely on the time the test took to run, it was improved. For the test, I >> ran with 3312 iterations. In the results I reported earlier in this thread, >> we were doing 864 iterations, so I don't have an apples-to-apples >> comparison at the moment. I will run an 864 iteration test and see how it >> compares to the earlier numbers. I'll report back when I have numbers. >> > > I ran the test with ACLs with 864 iterations. The results are nearly > exactly the same as when I had run the ip7 branch with no ACLs. That is, it > took around 19 minutes to run the test, and the CPU usage hovered around > 10% for the test. I also ran perf. The flame graph shows what we would > expect by this point. That is, the majority of processing time in > ovn-controller is spent in ofctrl_put(). > > So I'd say that address set incremental processing is successful in our > tests. Great job! > Wonderful news! Thanks a lot Mark, and I will add the numbers and tested-by in the commit message when I submit v4. (cc Ben since he is reviewing the patch)
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss