Re: [ovs-discuss] Incremental perf results

Han Zhou Tue, 05 Jun 2018 12:05:43 -0700

On Tue, Jun 5, 2018 at 12:00 PM, Mark Michelson <mmich...@redhat.com> wrote:


> On 06/05/2018 01:02 PM, Mark Michelson wrote:
>
>> On 06/05/2018 12:40 PM, Han Zhou wrote:
>>
>>>
>>>
>>> On Fri, May 18, 2018 at 2:03 PM, Han Zhou <zhou...@gmail.com <mailto:
>>> zhou...@gmail.com>> wrote:
>>>
>>>     Hi Mark,
>>>
>>>     Thank you so much for sharing this data. Please see my comments
>>> inline.
>>>
>>>     On Fri, May 18, 2018 at 1:31 PM, Mark Michelson <mmich...@redhat.com
>>>     <mailto:mmich...@redhat.com>> wrote:
>>>
>>>         Hi Han, I finally did some tests and looked at the CPU usage
>>>         between master and the ip7 branch.
>>>
>>>         On the machines running ovn-controller:
>>>         Master branch: Climbs to around 100% over the course of 3
>>>         minutes, then oscillates close to 100% for about 10 minutes, and
>>>         then is pegged to 100% for the rest of the test. Total test time
>>>         was about 23 minutes.
>>>         ip7 branch: oscillates between 10 and 25% for the first 10
>>>         minutes of the test, then hovers around 10% for the rest. Total
>>>         test time was about 19 minutes.
>>>
>>>     This is aligned with my observation of ~90% improvement on CPU cost.
>>>
>>>     For the throughput/total time, the improvement ratio is different
>>>     (in my test case the execution time reduced ~50%) but I think it can
>>>     be explained. The total execution time is not accurately reflecting
>>>     the efficiency of the processing, because when CPU is 100%,
>>>     ovn-controller processing will be slowed down which may just end up
>>>     less iterations during the whole test. I think the stop-watch
>>>     profiling mechanism you implemented (also rebased into the
>>>     incremental processing) will be able to tell the truth. The real
>>>     impact of that is longer latency for handling a change in control
>>>     plane. So I also use latency to evaluate the improvement. The way I
>>>     test latency is using ovn-nbctl --wait=hv, with the nb_cfg
>>>     improvement (https://patchwork.ozlabs.org/patch/899608/
>>>     <https://patchwork.ozlabs.org/patch/899608/>).
>>>
>>>         When I switched over to tests that have ACLs:
>>>         Master branch: Behaves about the same as the master branch when
>>>         no ACLs are used. Total test time was about 28 minutes
>>>         ip7 branch: CPU usage hovered around 30% for the entirety of the
>>>         test, hitting spikes around 50% a couple of times. Total test
>>>         time was about 25 minutes.
>>>
>>>         Since I had not done it yet, I also ran perf while running the
>>>         incremental branch with ACLs. I am attaching the flame graph
>>>         here. The gist is that much like the master branch, the majority
>>>         of CPU time is spent processing logical flows.
>>>
>>>         Seeing the drop in CPU usage between the master branch and the
>>>         ip7 branch makes me think it is worth investigating other areas
>>>         that may be the bottleneck. I monitored memory, disk usage, and
>>>         network usage on the machines, but I didn't see anything that
>>>         seemed obvious as being the cause for delay.
>>>
>>>     The CPU drop between master and ip7 when testing with ACLs, for my
>>>     understanding, most likely because of incremental processing avoids
>>>     recompute flows when irrelevant input such as pinctrl/ofctrl
>>>     messages (e.g. probe/echo) comes, while in master any of these
>>>     inputs would trigger recomputing.
>>>
>>>         CPU-wise, I think the biggest improvements that can be made to
>>>         the incremental processing branch are:
>>>         * Adding a change handler for the Address_Set table.
>>>         * ofctrl_put() improvements we have discussed.
>>>
>>>         I think this will have noticeable improvements in our test
>>>         times. However, based on how much the CPU usage dropped just
>>>         from switching to the incremental processing branch, I think
>>>         there are likely some other bottlenecks in our tests that would
>>>         be more impactful to remove. We already know that
>>>         "ovn_network.bind_port" and "ovn_network.wait_port_up" in
>>>         ovn-scale-test terminology are the operations in our test
>>>         iterations that take the longest. If we can break those down
>>>         into smaller pieces, we can potentially zero in on what to
>>>         target next.
>>>
>>>
>>>     I am not sure if there is any other *big* bottlenecks, but
>>>     address-set/port-group and ofctrl_put() improvement are surely
>>> needed :)
>>>     The latest patch I provided is from my ip9 branch, which is rebased
>>>     on master this week, with some code refactors. Feel free to try it,
>>>     but don't expect any performance difference.
>>>
>>>
>>> Hi Mark,
>>>
>>> Do you still have the same environment to try out the address-set
>>> incremental processing patches, to see if it improves the test results for
>>> ACLs with per-port address sets updates?
>>> The patch is v3: https://patchwork.ozlabs.org/p
>>> roject/openvswitch/list/?series=48060
>>> It is also in branch ip11.
>>>
>>> Thanks,
>>> Han
>>>
>>
>> As a matter of fact, I saw the ip11 branch this past Friday and gave it a
>> test during the weekend. I didn't run perf during the test, but based
>> solely on the time the test took to run, it was improved. For the test, I
>> ran with 3312 iterations. In the results I reported earlier in this thread,
>> we were doing 864 iterations, so I don't have an apples-to-apples
>> comparison at the moment. I will run an 864 iteration test and see how it
>> compares to the earlier numbers. I'll report back when I have numbers.
>>
>
> I ran the test with ACLs with 864 iterations. The results are nearly
> exactly the same as when I had run the ip7 branch with no ACLs. That is, it
> took around 19 minutes to run the test, and the CPU usage hovered around
> 10% for the test. I also ran perf. The flame graph shows what we would
> expect by this point. That is, the majority of processing time in
> ovn-controller is spent in ofctrl_put().
>
> So I'd say that address set incremental processing is successful in our
> tests. Great job!
>

Wonderful news! Thanks a lot Mark, and I will add the numbers and tested-by
in the commit message when I submit v4.

(cc Ben since he is reviewing the patch)

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Incremental perf results

Reply via email to