On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli
<honnappa.nagaraha...@arm.com> wrote:
>
> <snip>
>
> > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable element
> > size
> >
> > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results
> > >>> are as
> > >> follows. The numbers in brackets are with the code on master.
> > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > >>>
> > >>> RTE>>ring_perf_elem_autotest
> > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> > >>> 32): 2
> > >>>
> > >>> ### Testing empty dequeue ###
> > >>> SC empty dequeue: 2.11
> > >>> MC empty dequeue: 1.41 (2.11)
> > >>>
> > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > >>>
> > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> > >>> enq/dequeue
> > >>> (size: 32): 25.74 (20.91)
> > >>>
> > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> > >>> SP/SC bulk enq/dequeue (size:
> > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> > >>>
> > >>> On one of the Arm platform
> > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest
> > >>> are
> > >>> ok)
> >
> > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> > cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
> > follows:
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
> > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
> > MP/MC burst enq/dequeue (size: 32): 2
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 7.81
> > MC empty dequeue: 7.81
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 5.76
> > MP/MC bulk enq/dequeue (size: 8): 7.66
> > SP/SC bulk enq/dequeue (size: 32): 2.10
> > MP/MC bulk enq/dequeue (size: 32): 2.57
> >
> > ### Testing using two hyperthreads ###
> > SP/SC bulk enq/dequeue (size: 8): 13.13
> > MP/MC bulk enq/dequeue (size: 8): 13.98
> > SP/SC bulk enq/dequeue (size: 32): 3.41
> > MP/MC bulk enq/dequeue (size: 32): 4.45
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8): 63.41
> > MP/MC bulk enq/dequeue (size: 8): 62.70
> > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > 32): 22.96
> >
> Thanks for running this. There is another test 'ring_perf_autotest' which 
> provides the numbers with the original implementation. The goal is to make 
> sure the numbers with the original implementation are the same as these. Can 
> you please run that as well?

Honnappa,

Your earlier perf report shows the cycles are in less than 1. That's
is due to it is using 50 or 100MHz clock in EL0.
Please check with PMU counter. See "ARM64 profiling" in

http://doc.dpdk.org/guides/prog_guide/profile_app.html


Here is the octeontx2 values. There is a regression in two core cases
as you reported earlier in x86.


RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 21

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.36
SP/SC bulk enq/dequeue (size: 32): 13.10
MP/MC bulk enq/dequeue (size: 32): 21.64

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.94
MP/MC bulk enq/dequeue (size: 8): 107.66
SP/SC bulk enq/dequeue (size: 32): 24.51
MP/MC bulk enq/dequeue (size: 32): 33.23
Test OK
RTE>>

---- after applying v5 of the patch ------

RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 289
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 40
MP/MC burst enq/dequeue (size: 8): 64
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 39.73
MP/MC bulk enq/dequeue (size: 8): 69.13
SP/SC bulk enq/dequeue (size: 32): 13.44
MP/MC bulk enq/dequeue (size: 32): 22.00

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 76.02
MP/MC bulk enq/dequeue (size: 8): 112.50
SP/SC bulk enq/dequeue (size: 32): 24.71
MP/MC bulk enq/dequeue (size: 32): 33.34
Test OK
RTE>>

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 290
MP/MC single enq/dequeue: 503
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 63
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 19

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.92
MP/MC bulk enq/dequeue (size: 8): 62.54
SP/SC bulk enq/dequeue (size: 32): 11.46
MP/MC bulk enq/dequeue (size: 32): 19.89

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 87.55
MP/MC bulk enq/dequeue (size: 8): 99.10
SP/SC bulk enq/dequeue (size: 32): 26.63
MP/MC bulk enq/dequeue (size: 32): 29.91
Test OK
RTE>>



> > Dave

Reply via email to