Hi Artur, Shally,

> -----Original Message-----
> From: Trybula, ArturX
> Sent: Monday, July 15, 2019 3:01 PM
> To: Shally Verma <shal...@marvell.com>; dev@dpdk.org; Trahe, Fiona 
> <fiona.tr...@intel.com>;
> Dybkowski, AdamX <adamx.dybkow...@intel.com>; akhil.go...@nxp.com
> Subject: RE: [EXT] [PATCH v4 1/1] app/test-compress-perf: report header 
> improvement
> 
> -----Original Message-----
> From: Shally Verma [mailto:shal...@marvell.com]
> Sent: Monday, July 15, 2019 14:47
> To: Trybula, ArturX <arturx.tryb...@intel.com>; dev@dpdk.org; Trahe, Fiona 
> <fiona.tr...@intel.com>;
> Dybkowski, AdamX <adamx.dybkow...@intel.com>; akhil.go...@nxp.com
> Subject: RE: [EXT] [PATCH v4 1/1] app/test-compress-perf: report header 
> improvement
> 
> 
> 
> > -----Original Message-----
> > From: Artur Trybula <arturx.tryb...@intel.com>
> > Sent: Friday, July 12, 2019 4:13 PM
> > To: dev@dpdk.org; fiona.tr...@intel.com; Shally Verma
> > <shal...@marvell.com>; adamx.dybkow...@intel.com;
> > arturx.tryb...@intel.com; akhil.go...@nxp.com
> > Subject: [EXT] [PATCH v4 1/1] app/test-compress-perf: report header
> > improvement
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > This patch adds extra features to the compress performance test. Some
> > important parameters (memory allocation, number of ops, number of
> > segments) are calculated and printed out on the screen.
> > Information about compression threads is also provided.
> >
> > Signed-off-by: Artur Trybula <arturx.tryb...@intel.com>
> > ---
> >  .../comp_perf_test_benchmark.c                | 21 ++++-
> >  .../comp_perf_test_common.c                   | 94 ++++++++++++++++++-
> >  .../comp_perf_test_common.h                   |  6 ++
> >  app/test-compress-perf/main.c                 |  4 +-
> >  4 files changed, 120 insertions(+), 5 deletions(-)
> >
> > diff --git a/app/test-compress-perf/comp_perf_test_benchmark.c
> > b/app/test-compress-perf/comp_perf_test_benchmark.c
> > index aa1f8eea2..887459449 100644
> > --- a/app/test-compress-perf/comp_perf_test_benchmark.c
> > +++ b/app/test-compress-perf/comp_perf_test_benchmark.c
> > @@ -329,9 +329,26 @@ cperf_benchmark_test_runner(void *test_ctx)
> >     struct comp_test_data *test_data = ctx->ver.options;
> >     uint32_t lcore = rte_lcore_id();
> >     static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> > +   int i, ret = EXIT_SUCCESS;
> >
> >     ctx->ver.mem.lcore_id = lcore;
> > -   int i, ret = EXIT_SUCCESS;
> > +
> > +   /*
> > +    * printing information about current compression thread
> > +    */
> > +   if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> Question: runner() will be executed by each core so is ctx is thread local 
> variable or global.? If it can be
> made thread local, then there would be no need of atomic here?
> [Artur] Runners are restarted for each compression level. It's enough to have 
> the data printed only
> once.
[Fiona] added clarification - we originally did as you suggest, but in cases 
where there are multiple params, e.g. level 1-9, the thread is started/stopped 
for each level, so the header was printing out many times - always with the 
same data. So we needed the atomic to resolve.

> 
> > +           printf("    lcore: %u,"
> > +                           " driver name: %s,"
> > +                           " device name: %s,"
> > +                           " device id: %u,"
> > +                           " socket id: %u,"
> > +                           " queue pair id: %u\n",
> > +                   lcore,
> > +                   ctx->ver.options->driver_name,
> > +                   rte_compressdev_name_get(ctx->ver.mem.dev_id),
> > +                   ctx->ver.mem.dev_id,
> > +                   rte_compressdev_socket_id(ctx->ver.mem.dev_id),
> > +                   ctx->ver.mem.qp_id);
> >
> >     /*
> >      * First the verification part is needed @@ -374,7 +391,7 @@
> > cperf_benchmark_test_runner(void *test_ctx)
> >                     1000000000;
> >
> >     if (rte_atomic16_test_and_set(&display_once)) {
> > -           printf("%12s%6s%12s%17s%15s%16s\n",
> > +           printf("\n%12s%6s%12s%17s%15s%16s\n",
> >                     "lcore id", "Level", "Comp size", "Comp ratio [%]",
> >                     "Comp [Gbps]", "Decomp [Gbps]");
> >     }
> > diff --git a/app/test-compress-perf/comp_perf_test_common.c
> > b/app/test- compress-perf/comp_perf_test_common.c
> > index 472c76686..3dc9349b0 100644
> > --- a/app/test-compress-perf/comp_perf_test_common.c
> > +++ b/app/test-compress-perf/comp_perf_test_common.c
> > @@ -16,6 +16,18 @@
> >
> >  #define DIV_CEIL(a, b)  ((a) / (b) + ((a) % (b) != 0))
> >
> > +struct performance_tests_results {
> > +   uint16_t total_segments;
> > +   uint16_t segment_sz;
> > +   uint16_t last_segment_sz;
> > +   uint32_t total_buffs;         /*number of buffers = number of ops*/
> > +   uint16_t segments_per_buff;
> > +   uint16_t segments_per_last_buff;
> > +   size_t input_data_sz;
> > +};
> These looks more like test configuration than result. If you agree, then can 
> rename it to test_config ..
> [Artur] These are exactly tests results, internal variables used by the 
> algorithm. A kind of snapshot.
> Config by definition is used for configuration. In this case all the fields 
> are only printed out.
[Fiona] I can see the confusion, it's not the input config - but it's also 
confusing to call it results, as that implies the throughput and ratio numbers. 
How about cperf_buffer_info?
> 
> > +
> > +static struct performance_tests_results tests_res;
> > +
> >  int
> >  param_range_check(uint16_t size, const struct rte_param_log2_range
> > *range)  { @@ -170,6 +182,13 @@ comp_perf_allocate_memory(struct
> > comp_test_data *test_data,
> >                             " could not be allocated\n");
> >             return -1;
> >     }
> > +
> > +   tests_res.total_segments = total_segs;
> > +   tests_res.segment_sz = test_data->seg_sz;
> > +   tests_res.total_buffs = mem->total_bufs;
> > +   tests_res.segments_per_buff = test_data->max_sgl_segs;
> > +   tests_res.input_data_sz = test_data->input_data_sz;
> > +
> >     return 0;
> >  }
> >
> > @@ -178,9 +197,10 @@ prepare_bufs(struct comp_test_data *test_data,
> > struct cperf_mem_resources *mem)  {
> >     uint32_t remaining_data = test_data->input_data_sz;
> >     uint8_t *input_data_ptr = test_data->input_data;
> > -   size_t data_sz;
> > +   size_t data_sz = 0;
> >     uint8_t *data_addr;
> >     uint32_t i, j;
> > +   uint16_t segs_per_mbuf = 0;
> >
> Minimum segment per mbuf is 1. Then why don’t we initialize it to 1 here?
> [Artur] You are right Shally, if everything is fine with the input params and 
> with the algorithm, but it
> should be left 0 in case of problems with mbufs allocation (current 
> implementation).
> 
> 
> >     for (i = 0; i < mem->total_bufs; i++) {
> >             /* Allocate data in input mbuf and copy data from input file */ 
> > @@
> > -204,7 +224,7 @@ prepare_bufs(struct comp_test_data *test_data, struct
> > cperf_mem_resources *mem)
> >             remaining_data -= data_sz;
> >
> >             /* Already one segment in the mbuf */
> > -           uint16_t segs_per_mbuf = 1;
> > +           segs_per_mbuf = 1;
> >
> >             /* Chain mbufs if needed for input mbufs */
> >             while (segs_per_mbuf < test_data->max_sgl_segs @@ -
> > 281,5 +301,75 @@ prepare_bufs(struct comp_test_data *test_data, struct
> > cperf_mem_resources *mem)
> >             }
> >     }
> >
> > +   tests_res.segments_per_last_buff = segs_per_mbuf;
> > +   tests_res.last_segment_sz = data_sz;
> > +
> >     return 0;
> >  }
> > +
> > +void
> > +print_test_dynamics(void)
> > +{
> > +   uint32_t opt_total_segs = DIV_CEIL(tests_res.input_data_sz,
> > +                   MAX_SEG_SIZE);
> > +
> > +   if (tests_res.total_buffs > 1) {
> > +           printf("\nWarning: for the current input parameters number"
> A comma after 'input parameters' would improve readability here
> [Artur] - Ok
> 
> > +                           " of ops is higher than one, which may result"
> > +                           " in sub-optimal performance.\n");
> > +           printf("To improve the performance (for the current"
> > +                           " input data) following parameters are"
> > +                           " suggested:\n");
> > +           printf("        • Segment size: %d\n", MAX_SEG_SIZE);
> > +           printf("        • Number of segments: %u\n", opt_total_segs);
> > +   } else if (tests_res.total_buffs == 1) {
> > +           printf("\nWarning: there is only one op with %u segments –"
> May "Warning:" be replaced with "Info: " here.
> [Artur] Ok
> 
> > +                           " the compression ratio is the best.\n",
> > +                   tests_res.segments_per_last_buff);
> > +           if (tests_res.segment_sz < MAX_SEG_SIZE)
> > +                   printf("To reduce compression time, please use"
> > +                                   " bigger segment size: %d.\n",
> > +                           MAX_SEG_SIZE);
> > +           else if (tests_res.segment_sz == MAX_SEG_SIZE)
> > +                   printf("Segment size is optimal for the best"
> > +                                   " performance.\n");
> > +   } else
> > +           printf("Warning: something wrong happened!!\n");
> > +
> > +   printf("\nFor the current input parameters (segment size = %u,"
> > +                   " segments number = %u):\n",
> > +           tests_res.segment_sz,
> > +           tests_res.segments_per_buff);
> > +   printf("        • Total number of buffers: %d\n",
> > +           tests_res.total_segments);
> > +   printf("        • %u buffer(s) %u bytes long, last buffer %u"
> > +                   " byte(s) long\n",
> > +           tests_res.total_segments - 1,
> > +           tests_res.segment_sz,
> > +           tests_res.last_segment_sz);
> > +   printf("        • Number of ops: %u\n", tests_res.total_buffs);
> > +   printf("        • Total memory allocation: %u\n",
> > +           (tests_res.total_segments - 1) * tests_res.segment_sz
> > +           + tests_res.last_segment_sz);
> > +   if (tests_res.total_buffs > 1)
> > +           printf("        • %u ops: %u segment(s) in each,"
> > +                           " segment size %u\n",
> > +                   tests_res.total_buffs - 1,
> > +                   tests_res.segments_per_buff,
> > +                   tests_res.segment_sz);
> > +   if (tests_res.segments_per_last_buff > 1) {
> > +           printf("        • 1 op %u segments:\n",
> > +                           tests_res.segments_per_last_buff);
> > +           printf("                o %u segment size %u\n",
> > +                   tests_res.segments_per_last_buff - 1,
> > +                   tests_res.segment_sz);
> > +           printf("                o last segment size %u\n",
> > +                   tests_res.last_segment_sz);
> > +   } else if (tests_res.segments_per_last_buff == 1) {
> > +           printf("        • 1 op (the last one): %u segment %u"
> > +                           " byte(s) long\n\n",
> > +                   tests_res.segments_per_last_buff,
> > +                   tests_res.last_segment_sz);
> > +   }
> Probably this if and else if here can be replaced by just 1 statement.
> [Artur] I think it's ok. This version is clear. If you like I can remove {} 
> brackets from the "else if".
[Fiona] I think it's ok as is - there are different output whether >1 or ==1, 
so better to leave as separate paths.
> 
> 
> > +   printf("\n");
> > +}
> > diff --git a/app/test-compress-perf/comp_perf_test_common.h
> > b/app/test- compress-perf/comp_perf_test_common.h
> > index 9c11e3a00..c9e0c9081 100644
> > --- a/app/test-compress-perf/comp_perf_test_common.h
> > +++ b/app/test-compress-perf/comp_perf_test_common.h
> > @@ -13,6 +13,9 @@ struct cperf_mem_resources {
> >     uint8_t dev_id;
> >     uint16_t qp_id;
> >     uint8_t lcore_id;
> > +
> > +   rte_atomic16_t print_info_once;
> > +
> >     uint32_t total_bufs;
> >     uint8_t *compressed_data;
> >     uint8_t *decompressed_data;
> > @@ -38,4 +41,7 @@ comp_perf_allocate_memory(struct comp_test_data
> > *test_data,  int  prepare_bufs(struct comp_test_data *test_data,
> > struct
> 
> > cperf_mem_resources *mem);
> >
> > +void
> > +print_test_dynamics(void);
> > +
> >  #endif /* _COMP_PERF_TEST_COMMON_H_ */ diff --git
> > a/app/test-compress-perf/main.c b/app/test-compress-perf/main.c index
> > e746e4708..e7ac412e6 100644
> > --- a/app/test-compress-perf/main.c
> > +++ b/app/test-compress-perf/main.c
> > @@ -363,7 +363,7 @@ main(int argc, char **argv)
> >
> >     printf("App uses socket: %u\n", rte_socket_id());
> >     printf("Burst size = %u\n", test_data->burst_sz);
> > -   printf("File size = %zu\n", test_data->input_data_sz);
> > +   printf("Input data size = %zu\n", test_data->input_data_sz);
> >
> >     test_data->cleanup = ST_DURING_TEST;
> >     total_nb_qps = nb_compressdevs * test_data->nb_qps; @@ -390,6
> > +390,8 @@ main(int argc, char **argv)
> >             i++;
> >     }
> >
> > +   print_test_dynamics(); /* constructors must be executed first */
> > +
> >     while (test_data->level <= test_data->level_lst.max) {
> >
> >             i = 0;
> > --
> > 2.17.1

Reply via email to