Hi folks, As we're doing some performance tests in OpenStack using OVN, we noticed that as we keep creating ports, the time for creating a single port increases. Also, ovn-northd CPU consumption is quite high (see [0] which shows the CPU consumption when creating 1000 ports and deleting them. Last part where CPU is at 100% is when all the ports get deleted).
With 500 ports in the same Logical Switch, I did some profiling of OpenStack neutron-server adding 10 more ports to that Logical Switch. Currently, neutron-server spawns different API workers (separate processes) which open connections to OVN NB so every time an update message is sent from ovsdb-server it'll be processed by all of them. In my profiling, I used GreenletProfiler in all those processes to produce a trace file and then merged all of them together to aggregate the results. In those tests I used OVS master branch compiled it with shared libraries to make use of the JSON C parser. Still, I've seen that most of the time's been spent in the following two modules: - python/ovs/db/data.py: 33% - uuid.py: 21% For the data.py module, this is the usage (self time): Atom.__lt__ 16.25% 8283 calls from_json:118 6.18% 406935 calls Atom.__hash__ 3.48% 1623832 calls from_json:328 2.01% 5040 calls While for the uuid module: UUID.__cmp__ 12.84% 3570975 calls UUID.__init__ 4.06% 362541 calls UUID.__hash__ 2.96% 1800 calls UUID.__str__ 1.03% 355016 calls Most of the calls to Atom.__lt__ come from BaseOvnIdl.__process_update2(idl.py) -> BaseOvnIdl.__row_update(idl.py) -> Datum.__cmp__(data.py) -> Atom.__cmp__(data.py). The aggregated number of calls to BaseOvnIdl.__process_update2 is 1400 (and we're updating only 10 ports!!) while the total connections opened to NB database are 10: # netstat -np | grep 6641 | grep python | wc -l 10 * Bear in mind that those results above were aggregated across all processes. Looks like the main culprit for this explosion could be the way we handle ACLs. As every time we create a port, it'll belong to a Neutron security group (OVN Address Set) and we'll add a new ACL for every Neutron security group rule. If we patch the code to skip the ACL part, the time for creating a port remains stable over the time. >From the comparison tests against ML2/OVS (reference implementation), OVN outperforms in most of the operations except for the port creation where we can see it can become a bottleneck. Before optimizing/redesigning the ACL part, we could do some other changes to the way we handle notifications from OVSDB: eg., instead of having multiple processes receiving *all* notifications, we could have one single process subscribed to those notifications and send a more optimized (already parsed) multicast notification to all listening processes to keep their own in-memory copies of the DB up to date. All processes would connect to NB database in "write-only" mode to commit their transactions however. Even though this last paragraph would best fit in OpenStack ML I want to raise it here for feedback and see if someone can see some "immediate" optimization for the way we're processing notifications from OVSDB. Maybe some python binding to do it in C? :) Any feedback, comments or suggestions are highly appreciated :) Best, Daniel Alvarez [0] https://snapshot.raintank.io/dashboard/snapshot/dwbhn0Z1zVTh9kI5j6mCVySx8TvrP45m?orgId=2
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss