Implemented -z,--compression_level=n option that enables compression
of mmaped kernel data buffers content in runtime during perf record
mode collection.

Compression overhead has been measured for serial and AIO streaming
when profiling matrix multiplication workload:

    -------------------------------------------------------------
    | SERIAL                      | AIO-1                       |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00   | 1,000    179,424   | 1,00   | 1,000    187,527   |
| 1 | 1,04   | 8,427    181,148   | 1,01   | 8,474    188,562   |
| 2 | 1,07   | 8,055    186,953   | 1,03   | 7,912    191,773   |
| 3 | 1,04   | 8,283    181,908   | 1,03   | 8,220    191,078   |
| 5 | 1,09   | 8,101    187,705   | 1,05   | 7,780    190,065   |
| 8 | 1,05   | 9,217    179,191   | 1,12   | 6,111    193,024   |
-----------------------------------------------------------------

OVH = (Execution time with -z N) / (Execution time with -z 0)

ratio - compression ratio
size  - number of bytes that was compressed

        size ~= trace size x ratio

Signed-off-by: Alexey Budankov <alexey.budan...@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 5 +++++
 tools/perf/builtin-record.c              | 6 +++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index d1e6c1fd7387..632502b1f335 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -472,6 +472,11 @@ Also at some cases executing less trace write syscalls 
with bigger data size can
 shorter than executing more trace write syscalls with smaller data size thus 
lowering
 runtime profiling overhead.
 
+-z::
+--compression-level=n::
+Produce compressed trace using specified level n (no compression: 0 - default,
+fastest compression: 1, smallest trace: 22)
+
 --all-kernel::
 Configure all used events to run in kernel space.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 47e0abe22192..9f1bba6d4331 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -346,7 +346,7 @@ static void record__aio_mmap_read_sync(struct record *rec)
        struct perf_evlist *evlist = rec->evlist;
        struct perf_mmap *maps = evlist->mmap;
 
-       if (!rec->opts.nr_cblocks)
+       if (!record__aio_enabled(rec))
                return;
 
        for (i = 0; i < evlist->nr_mmaps; i++) {
@@ -2166,6 +2166,10 @@ static struct option __record_options[] = {
        OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
                     "Set affinity mask of trace reading thread to NUMA node 
cpu mask or cpu of processed mmap buffer",
                     record__parse_affinity),
+#ifdef HAVE_ZSTD_SUPPORT
+       OPT_UINTEGER('z', "compression-level", &record.opts.comp_level,
+                    "Produce compressed trace using specified level (default: 
0, fastest: 1, smallest: 22)"),
+#endif
        OPT_END()
 };
 
-- 
2.20.1

Reply via email to