Hi,
We run a scalability test case to test lport creation and binding. The git version is 06d4d4b on master branch with a patch to disable alive probe messages between ovsdb-server and ova-controller. The test environment is deployed as below: - a rally node, used to run test case - a ovn control node, used to run ovn-northd, ovn northbound ovsdb-server, ovn southbound ovsdb-server - 11 farm nodes, used to run sandboxes Each node is a bare-metal with hardware spec: - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz x 4, 40 cores totally - 251G memory Sandboxes are devided into groups, each group has 50 sandboxes. Both of farm node 0 and farm node 1 have only one group, each of other farm nodes has two groups. That is, 1000(20 groups) sandboxes are created on 11 bare-metals. The test steps as below: 1. Create 1000 sandboxes in bridge mode, that is a additional bridge 'br0' is created and identified by external-ids:ovn-bridge-mapping. 2. On ovn northbound, create 5 lswitches and create 200 lports for each lswitch. A additional lport with type 'localnet' is added to each lswitch. 3. For each of lport created in step 2, bind it to one group's sandboxes randomly, then use ‘ovn-nbctl wait-until Logical_Port <port-name> up=true’ to wait the port is ‘up’ on northbound 4. Goto step 2 Here is a brief test result. lport 1k lports ovn-northd ovsdb-server ovsdb-server ovnnb.db ovnsb.db number bind time memory(kB) northbound southbound (kB) (kB) 1000 ??? 6416 294784 8716 372 1519 2000 484.526 9872 1089264 1188 742 2549 3000 594.438 13484 2385536 12476 1111 3578 4000 685.491 17736 4176240 14920 1481 4608 5000 872.705 21704 6476420 17424 1851 5638 6000 958.363 25580 9272100 19844 2220 6668 7000 1142.056 29472 12561300 22268 2590 7698 8000 1258.395 33780 16346944 24676 2960 8728 9000 1446.025 37680 20653952 27184 3330 9757 10000 1567.252 41680 25446148 31808 3699 8364 11000 1800.981 45824 30750804 34248 4069 9394 12000 1940.967 49624 36541272 36408 4439 9873 13000 2117.564 53640 42843712 39108 4808 10681 14000 2231.282 57076 49627496 125672 5178 11465 15000 2448.585 61600 56893928 133864 5548 12271 16000 2614.816 65832 64678388 142184 5918 13040 17000 2839.524 69984 72993472 150816 6287 13831 18000 2952.906 73924 81802688 160484 6657 14630 19000 3143.878 77932 91138948 168676 7027 15444 20000 1529.746 81844 100955012 176868 7397 16233 Details: - 'lport number' column is the number of lport already created. Each row is a test result of create and bind 1000 lports. - '1k lports bind time' column is the total time of bind 1000 lports to sandboxes, lports are bound one by one. For each lport, the time consists of by: - ssh to a farm node, use ovs-vsctl to add lport to 'br-int' and update Interface table - ssh to control node, use ovn-nbctl to wait until lport's 'up' column is 'true' in Logial_Port table If we create only one sandbox, one lswith and one lport, then bind the lport to the sandbox, the time is about 100ms. - 'ovn-northd', 'ovsdb-server northbound' and 'ovsdb-server southdb' columns are memory usage of these 3 processes, unit is kB. - 'ovnnb.db' and 'ovnsb.db' columns are file size of DB file, unit in kB. - The last row show that the lport bind time is half of previous row, because the last 1000 lports are bound to sandboxes on fram node 0 which only has 50 sandboxes, while the prervious 1000 lports are bound to farm node 10 which has 100 sandboxes. While binding lports to sandbox, it's ovn-controller's cpu usage become very high for several seconds. After 3k lport bound, the farm node on which lport are binding to sandboxes, total cpu usage is about 100%. In production environment this will no happend, each bare-metal only runs one ovn-controller. The ovn northbound ovsdb-server's memory usage grows very fast, we are looking into the problem. The test case is implemented as a plugin of openstack/rally, it’s at https://github.com/l8huang/rally BR Huang Lei _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev