Re: [dpdk-dev] IXGBE throughput loss with 4+ cores

Wiles, Keith Thu, 06 Sep 2018 10:49:26 -0700


> On Sep 6, 2018, at 7:10 AM, Saber Rezvani <irsa...@zoho.com> wrote:
> 
> 
> 
> On 08/29/2018 11:22 PM, Wiles, Keith wrote: 
> > 
> >> On Aug 29, 2018, at 12:19 PM, Saber Rezvani <irsa...@zoho.com> wrote: 
> >> 
> >> 
> >> 
> >> On 08/29/2018 01:39 AM, Wiles, Keith wrote: 
> >>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani <irsa...@zoho.com> wrote: 
> >>>> 
> >>>> 
> >>>> 
> >>>> On 08/28/2018 11:39 PM, Wiles, Keith wrote: 
> >>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a 
> >>>>> performance problem. 
> >>>> I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core. 
> >>>> (10 Gb/s) but when I increase the number of core (one core per queue) 
> >>>> then I loose some performance (roughly 8.5 Gb/s for 8-core). In my 
> >>>> scenario Pktgen shows it is generating at line rate, but receiving 8.5 
> >>>> Gb/s. 
> >>>> Is it because of Pktgen??? 
> >>> Normally Pktgen can receive at line rate up to 10G 64 byte frames, which 
> >>> means Pktgen should not be the problem. You can verify that by looping 
> >>> the cable from one port to another on the pktgen machine to create a 
> >>> external loopback. Then send traffic what ever you can send from one port 
> >>> you should be able to receive those packets unless something is 
> >>> configured wrong. 
> >>> 
> >>> Please send me the command line for pktgen. 
> >>> 
> >>> 
> >>> In pktgen if you have this config -m “[1-4:5-8].0” then you have 4 cores 
> >>> sending traffic and 4 core receiving packets. 
> >>> 
> >>> In this case the TX cores will be sending the packets on all 4 lcores to 
> >>> the same port. On the rx side you have 4 cores polling 4 rx queues. The 
> >>> rx queues are controlled by RSS, which means the RX traffic 5 tuples hash 
> >>> must divide the inbound packets across all 4 queues to make sure each 
> >>> core is doing the same amount of work. If you are sending only a single 
> >>> packet on the Tx cores then only one rx queue be used. 
> >>> 
> >>> I hope that makes sense. 
> >> I think there is a misunderstanding of the problem. Indeed the problem is 
> >> not the Pktgen. 
> >> Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c 
> >> ffc0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem 
> >> 1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" 
> >> 
> >> The problem is when I run the symmetric_mp example for 
> >> $numberOfProcesses=8 cores, then I have less throughput (roughly 8.4 
> >> Gb/s). but when I run it for $numberOfProcesses=3 cores throughput is 10G. 
> >> for i in `seq $numberOfProcesses`; 
> >> do 
> >> .... some calculation goes here..... 
> >> symmetric_mp -c $coremask -n 2 --proc-type=auto -w 0b:00.0 -w 0b:00.1 
> >> --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 
> >> --num-procs=$numberOfProcesses --proc-id=$procid"; 
> >> ..... 
> >> done 
> > Most NICs have a limited amount of memory on the NIC and when you start to 
> > segment that memory because you are using more queues it can effect 
> > performance. 
> > 
> > In one of the NICs if you go over say 6 or 5 queues the memory per queue 
> > for Rx/Tx packets starts to become a bottle neck as you do not have enough 
> > memory in the Tx/Rx queues to hold enough packets. This can cause the NIC 
> > to drop Rx packets because the host can not pull the data from the NIC or 
> > Rx ring on the host fast enough. This seems to be the problem as the amount 
> > of time to process a packet on the host has not changed only the amount of 
> > buffer space in the NIC as you increase queues. 
> > 
> > I am not sure this is your issue, but I figured I would state this point. 
> What you said sounded logical, but is there away that I can be sure? I 
> mean are there some registers at NIC which show the number of packet 
> loss on NIC? or does DPDK have an API which shows the number of packet 
> loss at NIC level?


Yes if you look in the Docs Readthedocs.org/projects/dpdk you can find the API 
something like rte_eth_stats_get()

> > 
> >> I am trying find out what makes this loss! 
> >> 
> >> 
> >>>>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani <irsa...@zoho.com> wrote: 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: 
> >>>>>>> On Tue, 28 Aug 2018 17:34:27 +0430 
> >>>>>>> Saber Rezvani <irsa...@zoho.com> wrote: 
> >>>>>>> 
> >>>>>>>> Hi, 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> I have run multi_process/symmetric_mp example in DPDK example 
> >>>>>>>> directory. 
> >>>>>>>> For a one process its throughput is line rate but as I increase the 
> >>>>>>>> number of cores I see decrease in throughput. For example, If the 
> >>>>>>>> number 
> >>>>>>>> of queues set to 4 and each queue assigns to a single core, then the 
> >>>>>>>> throughput will be something about 9.4. if 8 queues, then throughput 
> >>>>>>>> will be 8.5. 
> >>>>>>>> 
> >>>>>>>> I have read the following, but it was not convincing. 
> >>>>>>>> 
> >>>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> I am eagerly looking forward to hearing from you, all. 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Best wishes, 
> >>>>>>>> 
> >>>>>>>> Saber 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> Not completely surprising. If you have more cores than packet line 
> >>>>>>> rate 
> >>>>>>> then the number of packets returned for each call to rx_burst will be 
> >>>>>>> less. 
> >>>>>>> With large number of cores, most of the time will be spent doing 
> >>>>>>> reads of 
> >>>>>>> PCI registers for no packets! 
> >>>>>> Indeed pktgen says it is generating traffic at line rate, but 
> >>>>>> receiving less than 10 Gb/s. So, it that case there should be 
> >>>>>> something that causes the reduction in throughput :( 
> >>>>>> 
> >>>>>> 
> >>>>> Regards, 
> >>>>> Keith 
> >>>>> 
> >>>> 
> >>> Regards, 
> >>> Keith 
> >>> 
> >> Best regards, 
> >> Saber 
> > Regards, 
> > Keith 
> > 
> Best regards, 
> Saber 
> 

Regards,
Keith

Re: [dpdk-dev] IXGBE throughput loss with 4+ cores

Reply via email to