Hi, Ben, 

>We're always excited to improve the performance of OVS, so I hope you
>will pass along your results.
     We did some evaluations on DPDK-based OVS. We use ClassBench[1] to 
generate 
1K and 10K rules, and also generate synthetic traffic for these rules. We 
choose to generate 
low locality because we think when the locality is high, the cache works well. 


    We use OVS 2.4 release, with DPDK 2.0. The network card is Intel 82599, 
10G. We run 
OVS on Intel Xeon processors (2.2GHz, 4 cores), all the evaluations are 
performed on a single core. 
All the rules are associated with the actions that forward packets from its 
input port to 
a fixed output port. The test traffic is one-way. 


    The results are follows:
    
|

ruleset

|

tx rate/port(Mbps)

|

rx rate/port(Mbps)

|
|

fw_1k

|

9990

|

49

|
|

fw_10k

|

9991

|

16

|
|

acl_1k

|

9995

|

207

|
|

acl_10k

|

9994

|

94

|
|

ipc_1k

|

9991

|

81

|
|

ipc_10k

|

9995

|

18

|


    This results show that under the low locality traffic, the performance of 
OVS is low. We further check the cache miss rate,
it shows that about 50% of the packets miss the first layer of cache and are 
matched against the second layer, however, very few 
packets are sent to the upcalls. We also check the number of the megaflow 
tuples, the number is quite large (around 100 ~ 1000 tuples). 


    So we decide to use a trie to prune these tuples and accelerate the 
performance. That is the place we found the bit mask could be 
discontiguous. We fill these discontiguous bits, and use a fast trie (Tree 
Bitmap) algorithm on destination IP addresses to prune the tuples. 
The results are as below:


|

rule

|

Rx rate/port(Mbps)

|

speedup

|
|

native_ovs

|

ovs_trie

|
|

acl_1k

|

207

|

582

|

2.81

|
|

acl_10k

|

94

|

245

|

2.61

|
|

fw_1k

|

49

|

196

|

4.00

|
|

fw_10k

|

16

|

113

|

7.06

|
|

ipc_1k

|

81

|

510

|

6.30

|
|

ipc_10k

|

18

|

256

|

14.22

|


    We also check the effect of enlarging the size of the first layer cache. We 
enlarge the cache size into 32K entries, the results show 
that the performance improvement is limited, for around 30%. 


    Any feedback is welcome. Thank you.
[1] Taylor, David E., and Jonathan S. Turner. "ClassBench: a packet 
classification benchmark." INFOCOM 2005. 24th Annual Joint Conference of the 
IEEE Computer and Communications Societies. Proceedings IEEE. Vol. 3. IEEE, 
2005.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to