On Tue, Mar 14, 2017 at 17:06:57 +0000, Dr. David Alan Gilbert wrote: > * Emilio G. Cota (c...@braap.org) wrote: > > It seems that a good benchmark to take translation overhead into account > > would be gcc/perlbench from SPEC (see [1]; ~20% of exec time is spent > > on translation). Unfortunately, none of them can be redistributed. > > > > I'll consider other options. For instance, I looked today at using golang's > > compilation tests, but they crash under qemu-user. I'll keep looking > > at other options -- the requirement is to have something that is easy > > to build (i.e. gcc is not an option) and that it runs fast. > > Yes, needs to be self contained but large enough to be interesting. > Isn't SPECs perlbench just a variant of a standard free benchmark > that can be used? > (Select alternative preferred language).
SPEC takes an old Perl distribution and a few standard Perl benchmarks. These sources (with SPEC's modifications) are of course redistributable. However, SPEC also adds scripts that are propietary. What I've ended up doing is selecting a small subset of the tests in the Perl distribution with a profile under QEMU similar to that of SPEC's perlbench (see patch below). This requires building (and testing) Perl, which takes a few minutes on a modern machine (ouch) but fortunately it is only done once. After that, the tests themselves take only a few seconds. The bummer is that cross-compiling the Perl distro is not officially supported. But well at least we have now an easy-to-run "compiler-like" benchmark, if only for the host's ISA. I updated the README with profile data -- I'm pasting that update below. Grab the changes from https://github.com/cota/dbt-bench Here are the numbers for the Perl benchmark, from QEMU v1.7 -> v2.8. The Y axis is Execution Time in seconds, so lower is better: x86_64 Perl Compilation Performance Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz 10 +-+---+------+-----+-----+-----+------+-----+----***----+------+---+-+ | + + + + + + + * + + | 9.8 +-+ #A +-+ | *** ## *# | 9.6 +-+ *## ***# +-+ 9.4 +-+ A # +-+ | #* #*** | 9.2 +-+ #*** #* +-+ | # A## | 9 +-+ *** *** *** # * # +-+ | A#####*** * *** * ***# *** # | 8.8 +-+ * #* ###A#####A#####* *# #*** +-+ 8.6 +-+ *** A## * * A######A * +-+ | *** *** *** * *** A | 8.4 +-+ * * +-+ | + + + + *** + + + + *** | 8.2 +-+---+------+-----+-----+-----+------+-----+-----+-----+------+---+-+ v1.7.0 v2.0.0v2.1.0v2.2.0v2.3.0 v2.4.0v2.5.0v2.6.0v2.7.0 v2.8.0 QEMU version PNGs for Perl + NBench here: http://imgur.com/a/LlpxE Thanks, Emilio commit f4ca2537bffe544779aa3f1814cec9d66dd9a17e Author: Emilio G. Cota <c...@braap.org> Date: Thu Mar 16 12:48:44 2017 -0400 README: document and quantify the difference between NBench and Perl While at it, also show how Perl's perf is very similar to SPEC06's perlbench. Signed-off-by: Emilio G. Cota <c...@braap.org> diff --git a/README.md b/README.md index b6d4037..b4578d6 100644 --- a/README.md +++ b/README.md @@ -61,3 +61,111 @@ Other output formats are possible, see `Makefile`. valuable files that were never meant to be committed (e.g. scripts). For this reason it is best to just clone a fresh QEMU repo to be used with DBT-bench rather than using your development tree. + +## What is the difference between the benchmarks? + +NBench programs are small, with execution time dominated by small code loops. Thus, +when run under a DBT engine, the resulting performance depends almost entirely +on the quality of the output code. + +The Perl benchmarks compile Perl code. As is common for compilation workloads, +they execute large amounts of code and show no particular code execution +hotspots. Thus, the resulting DBT performance depends largely on code +translation speed. + +Quantitatively, the differences can be clearly seen under a profiler. For QEMU +v2.8.0, we get: + +* NBench: + +``` +# Samples: 1M of event 'cycles:pp' +# Event count (approx.): 1111661663176 +# +# Overhead Command Shared Object Symbol +# ........ ............ ................... ......................................... +# + 6.26% qemu-x86_64 qemu-x86_64 [.] float64_mul + 6.24% qemu-x86_64 qemu-x86_64 [.] roundAndPackFloat64 + 4.18% qemu-x86_64 qemu-x86_64 [.] subFloat64Sigs + 2.72% qemu-x86_64 qemu-x86_64 [.] addFloat64Sigs + 2.29% qemu-x86_64 qemu-x86_64 [.] cpu_exec + 1.29% qemu-x86_64 qemu-x86_64 [.] float64_add + 1.12% qemu-x86_64 qemu-x86_64 [.] float64_sub + 0.79% qemu-x86_64 qemu-x86_64 [.] object_class_dynamic_cast_assert + 0.71% qemu-x86_64 qemu-x86_64 [.] helper_mulsd + 0.66% qemu-x86_64 perf-23090.map [.] 0x000055afd37d0b8a + 0.64% qemu-x86_64 perf-23090.map [.] 0x000055afd377cd8f + 0.59% qemu-x86_64 perf-23090.map [.] 0x000055afd37d019a + [...] +``` + +* Perl: + +``` +# Samples: 90K of event 'cycles:pp' +# Event count (approx.): 97757063053 +# +# Overhead Command Shared Object Symbol +# ........ ............ ....................... ........................................... +# + 22.93% qemu-x86_64 [kernel.kallsyms] [k] isolate_freepages_block + 9.38% qemu-x86_64 qemu-x86_64 [.] cpu_exec + 5.69% qemu-x86_64 qemu-x86_64 [.] tcg_gen_code + 5.30% qemu-x86_64 qemu-x86_64 [.] tcg_optimize + 3.45% qemu-x86_64 qemu-x86_64 [.] liveness_pass_1 + 3.24% qemu-x86_64 [kernel.kallsyms] [k] isolate_migratepages_block + 2.39% qemu-x86_64 qemu-x86_64 [.] object_class_dynamic_cast_assert + 1.48% qemu-x86_64 [kernel.kallsyms] [k] unlock_page + 1.29% qemu-x86_64 [kernel.kallsyms] [k] pageblock_pfn_to_page + 1.29% qemu-x86_64 qemu-x86_64 [.] tcg_out_opc.isra.13 + 1.11% qemu-x86_64 qemu-x86_64 [.] tcg_gen_op2 + 0.98% qemu-x86_64 [kernel.kallsyms] [k] migrate_pages + 0.87% qemu-x86_64 qemu-x86_64 [.] qht_lookup + 0.83% qemu-x86_64 qemu-x86_64 [.] tcg_temp_new_internal + 0.77% qemu-x86_64 qemu-x86_64 [.] tcg_out_modrm_sib_offset.constprop.37 + 0.76% qemu-x86_64 qemu-x86_64 [.] disas_insn.isra.49 + 0.70% qemu-x86_64 [kernel.kallsyms] [k] __wake_up_bit + 0.55% qemu-x86_64 [kernel.kallsyms] [k] __reset_isolation_suitable + 0.47% qemu-x86_64 qemu-x86_64 [.] tcg_opt_gen_mov + [...] +``` + +### Why don't you just run SPEC06? + +SPEC's source code cannot be redistributed. Some of its benchmarks are based +on free software, but the SPEC authors added on top of it non-free code +(usually scripts) that cannot be redistributed. + +For this reason we use here benchmarks that are freely redistributable, +while capturing different performance profiles: NBench represents "hotspot +code" and Perl represents a typical "compiler" workload. In fact, Perl's +performance profile under QEMU is very similar to that of SPEC06's perlbench; +compare Perl's profile above with SPEC06 perlbench's below: + +``` +# Samples: 14K of event 'cycles:pp' +# Event count (approx.): 15657871399 +# +# Overhead Command Shared Object Symbol +# ........ ........... ....................... ........................................... +# + 16.93% qemu-x86_64 qemu-x86_64 [.] cpu_exec + 9.16% qemu-x86_64 [kernel.kallsyms] [k] isolate_freepages_block + 5.47% qemu-x86_64 qemu-x86_64 [.] tcg_gen_code + 4.82% qemu-x86_64 qemu-x86_64 [.] tcg_optimize + 4.15% qemu-x86_64 qemu-x86_64 [.] object_class_dynamic_cast_assert + 3.25% qemu-x86_64 qemu-x86_64 [.] liveness_pass_1 + 1.55% qemu-x86_64 qemu-x86_64 [.] qht_lookup + 1.23% qemu-x86_64 qemu-x86_64 [.] tcg_gen_op2 + 1.04% qemu-x86_64 [kernel.kallsyms] [k] copy_page + 1.00% qemu-x86_64 qemu-x86_64 [.] tcg_out_opc.isra.13 + 0.82% qemu-x86_64 qemu-x86_64 [.] tcg_temp_new_internal + 0.78% qemu-x86_64 qemu-x86_64 [.] tcg_out_modrm_sib_offset.constprop.37 + 0.72% qemu-x86_64 qemu-x86_64 [.] tb_cmp + 0.69% qemu-x86_64 [kernel.kallsyms] [k] isolate_migratepages_block + 0.67% qemu-x86_64 qemu-x86_64 [.] disas_insn.isra.49 + 0.53% qemu-x86_64 qemu-x86_64 [.] object_get_class + 0.52% qemu-x86_64 [kernel.kallsyms] [k] __wake_up_bit + [...] +``` -- 2.7.4