> When testing ring performance in the case that multiple lcores are mapped to
> the same physical core, e.g. --lcores '(0-3)@10', it takes a very long time
> to wait for the "enqueue_dequeue_bulk_helper" to finish. This is because
> too much iteration numbers and extremely low efficiency for enqueue and
> dequeue with this kind of core mapping. Following are the test results to
> show the above phenomenon:
>
> x86-Intel(R) Xeon(R) Gold 6240:
> $sudo ./app/test/dpdk-test --lcores '(0-1)@25'
> Testing using two hyperthreads(bulk (size: 8):)
> iter_shift: 3 5 7 9 11 13 *15 17 19
> 21 23
> run time: 7s 7s 7s 8s 9s 16s 47s 170s 660s
> >0.5h >1h
> legacy APIs: SP/SC: 37 11 6 40525 40525 40209 40367 40407 40541
> NoData NoData
> legacy APIs: MP/MC: 56 14 11 50657 40526 40526 40526 40625 40585
> NoData NoData
>
> aarch64-n1sdp:
> $sudo ./app/test/dpdk-test --lcore '(0-1)@1'
> Testing using two hyperthreads(bulk (size: 8):)
> iter_shift: 3 5 7 9 11 13 *15 17 19
> 21 23
> run time: 8s 8s 8s 9s 9s 14s 34s 111s 418s
> 25min >1h
> legacy APIs: SP/SC: 0.4 0.2 0.1 488 488 488 488 488 489
> 489 NoData
> legacy APIs: MP/MC: 0.4 0.3 0.2 488 488 488 488 490 489
> 489 NoData
>
> As the number of iterations increases, so does the time which is required to
> run the program. Currently (iter_shift = 23), it will take more than 1 hour
> to wait for the test to finish. To fix this, the "iter_shift" should decrease
> and ensure enough iterations to keep the test data stable. In order to achieve
> this, we also test with "-l" EAL argument:
>
> x86-Intel(R) Xeon(R) Gold 6240:
> $sudo ./app/test/dpdk-test -l 25-26
> Testing using two NUMA nodes(bulk (size: 8):)
> iter_shift: 3 5 7 9 11 13 *15 17 19
> 21 23
> run time: 6s 6s 6s 6s 6s 6s 6s 7s 8s
> 11s 27s
> legacy APIs: SP/SC: 47 20 13 22 54 83 91 73 81
> 75 95
> legacy APIs: MP/MC: 44 18 18 240 245 270 250 249 252
> 250 253
>
> aarch64-n1sdp:
> $sudo ./app/test/dpdk-test -l 1-2
> Testing using two physical cores(bulk (size: 8):)
> iter_shift: 3 5 7 9 11 13 *15 17 19
> 21 23
> run time: 8s 8s 8s 8s 8s 8s 8s 9s 9s
> 11s 23s
> legacy APIs: SP/SC: 0.7 0.4 1.2 1.8 2.0 2.0 2.0 2.0 2.0
> 2.0 2.0
> legacy APIs: MP/MC: 0.3 0.4 1.3 1.9 2.9 2.9 2.9 2.9 2.9
> 2.9 2.9
>
> According to above test data, when "iter_shift" is set as "15", the test run
> time is reduced to less than 1 minute and the test result can keep stable
> in x86 and aarch64 servers.
>
> Fixes: 1fa5d0099efc ("test/ring: add custom element size performance tests")
> Cc: [email protected]
> Cc: [email protected]
>
> Signed-off-by: Feifei Wang <[email protected]>
> Reviewed-by: Honnappa Nagarahalli <[email protected]>
> Reviewed-by: Ruifeng Wang <[email protected]>
> ---
> app/test/test_ring_perf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index e63e25a86..fd82e2041 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int flag,
> const int esize,
> struct thread_params *p)
> {
> int ret;
> - const unsigned int iter_shift = 23;
> + const unsigned int iter_shift = 15;
> const unsigned int iterations = 1 << iter_shift;
> struct rte_ring *r = p->r;
> unsigned int bsize = p->size;
> --
I think it would be better to rework the test(s)
to terminate after some timeout (30s or so), and report number of ops per
timeout.
Anyway, as a short term fix, I am ok with it.
Acked-by: Konstantin Ananyev <[email protected]>
> 2.17.1