At Mon, 18 Jan 2016 12:12:28 +0900, IWAMOTO Toshihiro wrote: > > I'm sending out this mail to share the finding and discuss how to > improve with those interested in neutron ovs performance. > > TL;DR: The native of_interface code, which has been merged recently > and isn't default, seems to consume less CPU time but gives a mixed > result. I'm looking into this for improvement.
I went on to look at implementation details of eventlet etc, but it turned out to be fairly simple. The OVS agent in the of_interface=native mode waits for a openflow connection from ovs-vswitchd, which can take up to 5 seconds. Please look at the attached graph. The x-axis is time from agent restarts, the y-axis is numbers of ports processed (in treat_devices and bind_devices). Each port is counted twice; the first slope is treat_devices and the second is bind_devices. The native of_interface needs some more time on start-up, but bind_devices is about 2x faster. The data was collected with 160 VMs with the devstack default settings. > * Introduction > > With an ML2+ovs Neutron configuration, openflow rule modification > happens often and is somewhat a heavy operation as it involves > exec() of the ovs-ofctl command. > > The native of_interface driver doesn't use the ovs-ofctl command and > should have less performance impact on the system. This document > tries to confirm this hypothesis. > > > * Method > > In order to focus on openflow rule operation time and avoid noise from > other operations (VM boot-up, etc.), neutron-openvswitch-agent was > restarted and the time it took to reconfigure the flows was measured. > > 1. Use devstack to start a test environment. As debug logs generate > considable amount of load, ENABLE_DEBUG_LOG_LEVEL was set to false. > 2. Apply https://review.openstack.org/#/c/267905/ to enable > measurement of flow reconfiguration times. > 3. Boot 80 m1.nano instances. In my setup, this generates 404 br-int > flows. If you have >16G RAM, more could be booted. > 4. Stop neutron-openvswitch-agent and restart with --run-once arg. > Use time, oprofile, and python's cProfile (use --profile arg) to > collect data. > > * Results > > Execution time (averages of 3 runs): > > native 28.3s user 2.9s sys 0.4s > ovs-ofctl 25.7s user 2.2s sys 0.3s > > ovs-ofctl runs faster and seems to use less CPU, but the above doesn't > count in execution time of ovs-ofctl. With 160 VMs and debug=false for the OVS agent and the neutron-server, Execution time (averages and SDs of 10 runs): native 56.4+-3.4s user 8.7+-0.1s sys 0.82+-0.04s ovs-ofctl 55.9+-1.0s user 6.9+-0.08s sys 0.67+-0.05s To exclude the openflow connection waits, times between log outputs of "Loaded agent extensions" and "Configuration for devices up completed" is also compared: native 48.2+-0.49s ovs-ofctl 53.2+-0.99s The native of_interface is the clear winner. > Oprofile data collected by running "operf -s -t" contain the > information. > > With of_interface=native config, "opreport tgid:<pid of ovs agent>" shows: > > samples| %| > ------------------ > 87408 100.000 python2.7 > CPU_CLK_UNHALT...| > samples| %| > ------------------ > 69160 79.1232 python2.7 > 8416 9.6284 vmlinux-3.13.0-24-generic > > and "opreport --merge tgid" doesn't show ovs-ofctl. > > With of_interface=ovs-ofctl, "opreport tgid:<pid of ovs agent>" shows: > > samples| %| > ------------------ > 62771 100.000 python2.7 > CPU_CLK_UNHALT...| > samples| %| > ------------------ > 49418 78.7274 python2.7 > 6483 10.3280 vmlinux-3.13.0-24-generic > > and "opreport --merge tgid" shows CPU consumption by ovs-ofctl > > 35774 3.5979 ovs-ofctl > CPU_CLK_UNHALT...| > samples| %| > ------------------ > 28219 78.8813 vmlinux-3.13.0-24-generic > 3487 9.7473 ld-2.19.so > 2301 6.4320 ovs-ofctl > > Comparing 87408 (native python) with 62771+35774, the native > of_interface uses 0.4s less CPU time overall. > > * Conclusion and future steps > > The native of_interface uses slightly less CPU time but takes longer > time to complete a flow reconfiguration after an agent restart. > > As an OVS agent accounts for only 1/10th of total CPU usage during a > flow reconfiguration (data not shown), there may be other areas for > improvement. > > The cProfile Python module gives more fine grained data, but no > apparent performance bottleneck was found. The data show more > eventlet context switches with the native of_interface, which is due > to how the native of_interface is written. I'm looking into for > improving CPU usage and latency.
of_int-comparison.pdf
Description: Adobe PDF document
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
