Thanks much Bodireddy again! comment inline. On Mon, Jul 3, 2017 at 5:00 PM, Bodireddy, Bhanuprakash < bhanuprakash.bodire...@intel.com> wrote:
> It’s a long weekend in US and will try answering some of your questions in > Darrell's absence. > > >Why do think having more than 64k per PMD would be optimal ? > >I originally thought that the bottleneck in classifier because it is > saturated full > >so that look up has to be going to flow table, so I think why not just > increase > >the dpcls flows per PMD, but seems I am wrong based on your explanation. > > For few use cases much of the bottleneck moves to Classifier when EMC is > saturated. You may have > to add more PMD threads (again this depends on the availability of cores > in your case.) > As your initial investigation proved classifier is bottleneck, just > curious about few things. > - In the 'dpif-netdev/pmd-stats-show' output, what does the ' avg. > subtable lookups per hit:' looks like? > - In steady state do 'dpcls_lookup()' top the list of functions with > 'perf top'. > > Those are great advices, I'll check more. > >What is your use case(s) ? > >My usecase might be setup a VBRAS VNF with OVS-DPDK as an NFV normal > >case, and it requires a good performance, however, OVS-DPDK seems still > not > >reach its needs compared with hardware offloading, we are evaluating VPP > as > >well, > As you mentioned VPP here, It's worth looking at the benchmarks that were > carried comparing > OvS and VPP for L3-VPN use case by Intel, Ericsson and was presented in > OvS Fall conference. > The slides can be found @ http://openvswitch.org/ > support/ovscon2016/8/1400-gray.pdf. > In above pdf page 12, why does classifier showed a constant throughput with increasing concurrent L4 flows? shouldn't the performance get degradation with more subtable look up as you mentioned. > > basically I am looking to find out what's the bottleneck so far in OVS- > >DPDK (seems in flow look up), and if there are some solutions being > discussed > >or working in progress. > > I personally did some investigation in this area. One of the bottlenecks > in classifier is due to sub-table lookup. > Murmur hash is used in OvS and it is recommended enabling intrinsics with > -march=native/CFLAGS="-msse4.2" if not done. > If you have more subtables, the lookups may be taking significant cycles. > I presume you are using OvS 2.7. Some optimizations > were done to improve classifier performance(subtable ranking, hash > optimizations). > If emc_lookup()/emc_insert() show up in top 5 functions taking significant > cycles, worth disabling EMC as below. > 'ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv- > prob=0' > Thanks much for your advice. > > >Are you wanting for this number to be larger by default ? > >I am not sure, I need to understand whether it is good or bad to set it > larger. > >Are you wanting for this number to be configurable ? > >Probably good. > > > >BTW, after reading part of DPDK document, it strengthens to decrease to > copy > >between cache and memory and get cache hit as much as possible to get > >fewer cpu cycles to fetch data, but now I am totally lost on how does OVS- > >DPDK emc and classifier map to the LLC. > > I didn't get your question here. PMD is like any other thread and has EMC > and a classifier per ingress port. > The EMC, classifier subtables and other data structures will make it to > LLC when accessed. > ACK. > > As already mentioned using RDT Cache Allocation Technology(CAT), one can > assign cache ways to high priority threads > https://software.intel.com/en-us/articles/introduction-to- > cache-allocation-technology > > - Bhanuprakash. > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss