On 08/29/2018 11:22 PM, Wiles, Keith wrote: > >> On Aug 29, 2018, at 12:19 PM, 
Saber Rezvani <irsa...@zoho.com> wrote: >> >> >> >> On 08/29/2018 01:39 AM, 
Wiles, Keith wrote: >>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani 
<irsa...@zoho.com> wrote: >>>> >>>> >>>> >>>> On 08/28/2018 11:39 PM, Wiles, 
Keith wrote: >>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to 
fix a performance problem. >>>> I use Pktgen verion 3.0.0, indeed it is O.k as 
far as I have one core. (10 Gb/s) but when I increase the number of core (one 
core per queue) then I loose some performance (roughly 8.5 Gb/s for 8-core). In 
my scenario Pktgen shows it is generating at line rate, but receiving 8.5 Gb/s. 
>>>> Is it because of Pktgen??? >>> Normally Pktgen can receive at line rate up 
to 10G 64 byte frames, which means Pktgen should not be the problem. You can 
verify that by looping the cable from one port to another on the pktgen machine 
to create a external loopback. Then send traffic what ever you can send from 
one port you should be able to receive those packets unless something is 
configured wrong. >>> >>> Please send me the command line for pktgen. >>> >>> 
>>> In pktgen if you have this config -m “[1-4:5-8].0” then you have 4 cores 
sending traffic and 4 core receiving packets. >>> >>> In this case the TX cores 
will be sending the packets on all 4 lcores to the same port. On the rx side 
you have 4 cores polling 4 rx queues. The rx queues are controlled by RSS, 
which means the RX traffic 5 tuples hash must divide the inbound packets across 
all 4 queues to make sure each core is doing the same amount of work. If you 
are sending only a single packet on the Tx cores then only one rx queue be 
used. >>> >>> I hope that makes sense. >> I think there is a misunderstanding 
of the problem. Indeed the problem is not the Pktgen. >> Here is my command --> 
./app/app/x86_64-native-linuxapp-gcc/pktgen -c ffc0000 -n 4 -w 84:00.0 -w 
84:00.1 --file-prefix pktgen_F2 --socket-mem 1000,2000,1000,1000 -- -T -P -m 
"[18-19:20-21].0, [22:23].1" >> >> The problem is when I run the symmetric_mp 
example for $numberOfProcesses=8 cores, then I have less throughput (roughly 
8.4 Gb/s). but when I run it for $numberOfProcesses=3 cores throughput is 10G. 
>> for i in `seq $numberOfProcesses`; >> do >> .... some calculation goes 
here..... >> symmetric_mp -c $coremask -n 2 --proc-type=auto -w 0b:00.0 -w 
0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 
--num-procs=$numberOfProcesses --proc-id=$procid"; >> ..... >> done > Most NICs 
have a limited amount of memory on the NIC and when you start to segment that 
memory because you are using more queues it can effect performance. > > In one 
of the NICs if you go over say 6 or 5 queues the memory per queue for Rx/Tx 
packets starts to become a bottle neck as you do not have enough memory in the 
Tx/Rx queues to hold enough packets. This can cause the NIC to drop Rx packets 
because the host can not pull the data from the NIC or Rx ring on the host fast 
enough. This seems to be the problem as the amount of time to process a packet 
on the host has not changed only the amount of buffer space in the NIC as you 
increase queues. > > I am not sure this is your issue, but I figured I would 
state this point. What you said sounded logical, but is there away that I can 
be sure? I mean are there some registers at NIC which show the number of packet 
loss on NIC? or does DPDK have an API which shows the number of packet loss at 
NIC level? > >> I am trying find out what makes this loss! >> >> >>>>>> On Aug 
28, 2018, at 12:05 PM, Saber Rezvani <irsa...@zoho.com> wrote: >>>>>> >>>>>> 
>>>>>> >>>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: >>>>>>> On Tue, 
28 Aug 2018 17:34:27 +0430 >>>>>>> Saber Rezvani <irsa...@zoho.com> wrote: 
>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> I have run 
multi_process/symmetric_mp example in DPDK example directory. >>>>>>>> For a 
one process its throughput is line rate but as I increase the >>>>>>>> number 
of cores I see decrease in throughput. For example, If the number >>>>>>>> of 
queues set to 4 and each queue assigns to a single core, then the >>>>>>>> 
throughput will be something about 9.4. if 8 queues, then throughput >>>>>>>> 
will be 8.5. >>>>>>>> >>>>>>>> I have read the following, but it was not 
convincing. >>>>>>>> >>>>>>>> 
http://mails.dpdk.org/archives/dev/2015-October/024960.html >>>>>>>> >>>>>>>> 
>>>>>>>> I am eagerly looking forward to hearing from you, all. >>>>>>>> 
>>>>>>>> >>>>>>>> Best wishes, >>>>>>>> >>>>>>>> Saber >>>>>>>> >>>>>>>> 
>>>>>>> Not completely surprising. If you have more cores than packet line rate 
>>>>>>> then the number of packets returned for each call to rx_burst will be 
less. >>>>>>> With large number of cores, most of the time will be spent doing 
reads of >>>>>>> PCI registers for no packets! >>>>>> Indeed pktgen says it is 
generating traffic at line rate, but receiving less than 10 Gb/s. So, it that 
case there should be something that causes the reduction in throughput :( 
>>>>>> >>>>>> >>>>> Regards, >>>>> Keith >>>>> >>>> >>> Regards, >>> Keith >>> 
>> Best regards, >> Saber > Regards, > Keith > Best regards, Saber

Reply via email to