It’s a long weekend in US and will try answering some of your questions in 
Darrell's absence.

>Why do think having more than 64k per PMD would be optimal ?
>I originally thought that the bottleneck in classifier because it is saturated 
>full
>so that look up has to be going to flow table, so I think why not just increase
>the dpcls flows per PMD, but seems I am wrong based on your explanation.

For few use cases much of the bottleneck moves to Classifier when EMC is 
saturated. You may have
to add more  PMD threads (again this depends on the availability of cores in 
your case.)
As your initial investigation proved classifier is bottleneck, just curious 
about few things.
     -  In the 'dpif-netdev/pmd-stats-show' output, what does the ' avg. 
subtable lookups per hit:'  looks like?
     -  In steady state do 'dpcls_lookup()' top the list of functions with 
'perf top'.

>What is your use case(s) ?
>My usecase might be setup a VBRAS VNF with OVS-DPDK as an NFV normal
>case, and it requires a good performance, however, OVS-DPDK seems still not
>reach its needs compared with  hardware offloading, we are evaluating VPP as
>well, 
As you mentioned VPP here, It's worth looking at the benchmarks that were 
carried comparing
OvS and VPP for L3-VPN use case by Intel, Ericsson and was presented in OvS 
Fall conference.
The slides can be found @ 
http://openvswitch.org/support/ovscon2016/8/1400-gray.pdf.

basically I am looking to find out what's the bottleneck so far in OVS-
>DPDK (seems in flow look up), and if there are some solutions being discussed
>or working in progress.

I personally did some investigation in this area. One of the bottlenecks in 
classifier is due to sub-table lookup.
Murmur hash is used in OvS and it is  recommended enabling intrinsics with 
-march=native/CFLAGS="-msse4.2"  if not done. 
If you have more subtables, the lookups may be taking significant cycles.  I 
presume you are using OvS 2.7. Some optimizations
were done to  improve classifier  performance(subtable ranking, hash 
optimizations). 
If emc_lookup()/emc_insert() show up in top 5 functions taking significant 
cycles, worth disabling EMC as below.
          'ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=0'

>Are you wanting for this number to be larger by default ?
>I am not sure, I need to understand whether it is good or bad to set it larger.
>Are you wanting for this number to be configurable ?
>Probably good.
>
>BTW, after reading part of DPDK document, it strengthens to decrease to copy
>between cache and memory and get cache hit as much as possible to get
>fewer cpu cycles to fetch data, but now I am totally lost on how does OVS-
>DPDK emc and classifier map to the LLC.

I didn't get your question here. PMD is like any other thread and has EMC and a 
classifier per ingress port.
The EMC,  classifier subtables and other data structures will make it to LLC 
when accessed. 

As already mentioned using RDT Cache Allocation Technology(CAT), one can assign 
cache ways to high priority threads
https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology

- Bhanuprakash.

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to