Hi,

We run a scalability test case to test lport creation and binding.


The git version is 06d4d4b on master branch with a patch to disable alive
probe messages between ovsdb-server and ova-controller.


The test environment is deployed as below:

- a rally node, used to run test case

- a ovn control node, used to run ovn-northd, ovn northbound ovsdb-server,
ovn southbound ovsdb-server

- 11 farm nodes, used to run sandboxes


Each node is a bare-metal with hardware spec:

- Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz x 4, 40 cores totally

- 251G memory


Sandboxes are devided into groups, each group has 50 sandboxes. Both of
farm node 0 and farm node 1 have only one group, each of other farm nodes
has two groups. That is, 1000(20 groups) sandboxes are created on 11
bare-metals.


The test steps as below:

1. Create 1000 sandboxes in bridge mode, that is a additional bridge 'br0'
is created and identified by external-ids:ovn-bridge-mapping.

2. On ovn northbound, create 5 lswitches and create 200 lports for each
lswitch. A additional lport with type 'localnet' is added to each lswitch.

3. For each of lport created in step 2, bind it to one group's sandboxes
randomly, then use ‘ovn-nbctl wait-until Logical_Port <port-name> up=true’
to wait the port is ‘up’ on northbound

4. Goto step 2



Here is a brief test result.


lport   1k lports  ovn-northd  ovsdb-server  ovsdb-server ovnnb.db ovnsb.db

number  bind time  memory(kB)  northbound    southbound       (kB)     (kB)

1000         ???      6416         294784          8716       372      1519

2000     484.526      9872        1089264          1188       742      2549

3000     594.438     13484        2385536         12476      1111      3578

4000     685.491     17736        4176240         14920      1481      4608

5000     872.705     21704        6476420         17424      1851      5638

6000     958.363     25580        9272100         19844      2220      6668

7000    1142.056     29472       12561300         22268      2590      7698

8000    1258.395     33780       16346944         24676      2960      8728

9000    1446.025     37680       20653952         27184      3330      9757

10000   1567.252     41680       25446148         31808      3699      8364

11000   1800.981     45824       30750804         34248      4069      9394

12000   1940.967     49624       36541272         36408      4439      9873

13000   2117.564     53640       42843712         39108      4808     10681

14000   2231.282     57076       49627496        125672      5178     11465

15000   2448.585     61600       56893928        133864      5548     12271

16000   2614.816     65832       64678388        142184      5918     13040

17000   2839.524     69984       72993472        150816      6287     13831

18000   2952.906     73924       81802688        160484      6657     14630

19000   3143.878     77932       91138948        168676      7027     15444

20000   1529.746     81844      100955012        176868      7397     16233


Details:

- 'lport number' column is the number of lport already created. Each row is
a test result of create and bind 1000 lports.

- '1k lports bind time' column is the total time of bind 1000 lports to
sandboxes, lports are bound one by one. For each lport, the time consists
of by:

    - ssh to a farm node, use ovs-vsctl to add lport to 'br-int' and update
Interface table

    - ssh to control node, use ovn-nbctl to wait until lport's 'up' column
is 'true' in Logial_Port table

    If we create only one sandbox, one lswith and one lport, then bind the
lport to the sandbox, the time is about 100ms.

- 'ovn-northd', 'ovsdb-server northbound' and 'ovsdb-server southdb'
columns are memory usage of these 3 processes, unit is kB.

- 'ovnnb.db' and 'ovnsb.db' columns are file size of DB file, unit in kB.

- The last row show that the lport bind time is half of previous row,
because the last 1000 lports are bound to sandboxes on fram node 0 which
only has 50 sandboxes, while the prervious 1000 lports are bound to farm
node 10 which has 100 sandboxes.



While binding lports to sandbox, it's ovn-controller's cpu usage become
very high for several seconds. After 3k lport bound, the farm node on which
lport are binding to sandboxes, total cpu usage is about 100%. In
production environment this will no happend, each bare-metal only runs one
ovn-controller.


The ovn northbound ovsdb-server's memory usage grows very fast, we are
looking into the problem.


The test case is implemented as a plugin of openstack/rally, it’s at
https://github.com/l8huang/rally


BR

Huang Lei
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to