> On Sep 6, 2018, at 7:10 AM, Saber Rezvani <irsa...@zoho.com> wrote: > > > > On 08/29/2018 11:22 PM, Wiles, Keith wrote: > > > >> On Aug 29, 2018, at 12:19 PM, Saber Rezvani <irsa...@zoho.com> wrote: > >> > >> > >> > >> On 08/29/2018 01:39 AM, Wiles, Keith wrote: > >>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani <irsa...@zoho.com> wrote: > >>>> > >>>> > >>>> > >>>> On 08/28/2018 11:39 PM, Wiles, Keith wrote: > >>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a > >>>>> performance problem. > >>>> I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core. > >>>> (10 Gb/s) but when I increase the number of core (one core per queue) > >>>> then I loose some performance (roughly 8.5 Gb/s for 8-core). In my > >>>> scenario Pktgen shows it is generating at line rate, but receiving 8.5 > >>>> Gb/s. > >>>> Is it because of Pktgen??? > >>> Normally Pktgen can receive at line rate up to 10G 64 byte frames, which > >>> means Pktgen should not be the problem. You can verify that by looping > >>> the cable from one port to another on the pktgen machine to create a > >>> external loopback. Then send traffic what ever you can send from one port > >>> you should be able to receive those packets unless something is > >>> configured wrong. > >>> > >>> Please send me the command line for pktgen. > >>> > >>> > >>> In pktgen if you have this config -m “[1-4:5-8].0” then you have 4 cores > >>> sending traffic and 4 core receiving packets. > >>> > >>> In this case the TX cores will be sending the packets on all 4 lcores to > >>> the same port. On the rx side you have 4 cores polling 4 rx queues. The > >>> rx queues are controlled by RSS, which means the RX traffic 5 tuples hash > >>> must divide the inbound packets across all 4 queues to make sure each > >>> core is doing the same amount of work. If you are sending only a single > >>> packet on the Tx cores then only one rx queue be used. > >>> > >>> I hope that makes sense. > >> I think there is a misunderstanding of the problem. Indeed the problem is > >> not the Pktgen. > >> Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c > >> ffc0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem > >> 1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" > >> > >> The problem is when I run the symmetric_mp example for > >> $numberOfProcesses=8 cores, then I have less throughput (roughly 8.4 > >> Gb/s). but when I run it for $numberOfProcesses=3 cores throughput is 10G. > >> for i in `seq $numberOfProcesses`; > >> do > >> .... some calculation goes here..... > >> symmetric_mp -c $coremask -n 2 --proc-type=auto -w 0b:00.0 -w 0b:00.1 > >> --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 > >> --num-procs=$numberOfProcesses --proc-id=$procid"; > >> ..... > >> done > > Most NICs have a limited amount of memory on the NIC and when you start to > > segment that memory because you are using more queues it can effect > > performance. > > > > In one of the NICs if you go over say 6 or 5 queues the memory per queue > > for Rx/Tx packets starts to become a bottle neck as you do not have enough > > memory in the Tx/Rx queues to hold enough packets. This can cause the NIC > > to drop Rx packets because the host can not pull the data from the NIC or > > Rx ring on the host fast enough. This seems to be the problem as the amount > > of time to process a packet on the host has not changed only the amount of > > buffer space in the NIC as you increase queues. > > > > I am not sure this is your issue, but I figured I would state this point. > What you said sounded logical, but is there away that I can be sure? I > mean are there some registers at NIC which show the number of packet > loss on NIC? or does DPDK have an API which shows the number of packet > loss at NIC level?
Yes if you look in the Docs Readthedocs.org/projects/dpdk you can find the API something like rte_eth_stats_get() > > > >> I am trying find out what makes this loss! > >> > >> > >>>>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani <irsa...@zoho.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: > >>>>>>> On Tue, 28 Aug 2018 17:34:27 +0430 > >>>>>>> Saber Rezvani <irsa...@zoho.com> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> > >>>>>>>> I have run multi_process/symmetric_mp example in DPDK example > >>>>>>>> directory. > >>>>>>>> For a one process its throughput is line rate but as I increase the > >>>>>>>> number of cores I see decrease in throughput. For example, If the > >>>>>>>> number > >>>>>>>> of queues set to 4 and each queue assigns to a single core, then the > >>>>>>>> throughput will be something about 9.4. if 8 queues, then throughput > >>>>>>>> will be 8.5. > >>>>>>>> > >>>>>>>> I have read the following, but it was not convincing. > >>>>>>>> > >>>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html > >>>>>>>> > >>>>>>>> > >>>>>>>> I am eagerly looking forward to hearing from you, all. > >>>>>>>> > >>>>>>>> > >>>>>>>> Best wishes, > >>>>>>>> > >>>>>>>> Saber > >>>>>>>> > >>>>>>>> > >>>>>>> Not completely surprising. If you have more cores than packet line > >>>>>>> rate > >>>>>>> then the number of packets returned for each call to rx_burst will be > >>>>>>> less. > >>>>>>> With large number of cores, most of the time will be spent doing > >>>>>>> reads of > >>>>>>> PCI registers for no packets! > >>>>>> Indeed pktgen says it is generating traffic at line rate, but > >>>>>> receiving less than 10 Gb/s. So, it that case there should be > >>>>>> something that causes the reduction in throughput :( > >>>>>> > >>>>>> > >>>>> Regards, > >>>>> Keith > >>>>> > >>>> > >>> Regards, > >>> Keith > >>> > >> Best regards, > >> Saber > > Regards, > > Keith > > > Best regards, > Saber > Regards, Keith