v2: https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06805.html
Changes since v2: - Add R-b tags - Add a patch to rename our canonicalize to sf_canonicalize, to avoid clashing with glibc's. - Add a patch to define float{32,64}_is_zero_or_normal - Simplify the float{32,64}_input_flushX macros -- now the macros are more verbose but the full function names are greppable. - Move tests/fp-test to tests/fp, since now both fp-bench and fp-test are under tests/fp. + Use tests/fp/fp-test.h for helpers common to both fp-bench and fp-test. - Complete rewrite of fp-bench: + We can now directly call the softfloat functions, thereby making the benchmark more sensitive to changes to those functions. + We can still use the native ops with "-t host". + The rewrite also has less macro trickery; we rely instead on constant propagation by the compiler. + Alex: dropped your R-b since this changed a lot. I think you'll like this version better though! - Define a generic function to generate the hardfloat implementation for ops with 2 inputs; add, sub, mul and div depend on it. Instead of using macros, rely on the constant propagation done by the compiler. [Alex: I dropped your R-b for the addsub patch because it changed a lot] + I kept macros for other ops, because I think the subsequent code duplication savings are worth the pain. - Add #define's to select whether to use fpclassify etc. or float32_is_zero etc. + Benchmark perf differences on x86_64, aarch64 and IBM Power8 hosts. + For 32-bit we don't use fpclassify etc. for any architectures, so I was tempted to get rid of this option to save some code. It's possible however that on some hosts I have not tested this option might pay off, so I decided to keep it there. - Add a #define to select whether to use isinf() or floatX_is_infinity(). Turns out this makes a big difference for power64. - Remove float32_to_float64 support in hardfloat, since nbench or SPEC actually showed a small yet measurable slowdown with it, despite fp-bench showing a significant speedup for this operation. - Do not flatten soft-fp functions; these are now slow paths. This shrinks the size of the softfloat object below its original size (see last patch's log). - Add a #define to disable hardfloat for some targets. I noticed that some targets (at least I noticed PPC, there might be others) do clear the FP flags before calling softfloat. This precludes hardfloat since it relies on inexact not being set. In the long run we should fix these targets though. Note: fp-bench can run _very_ slowly (~0.5 IPC) for -o fma on some x86_64 hosts. I have not pinned down what's going on, but from the few hosts I have access to, it seems that machines that have been patched for Spectre/Meltdown are susceptible to this slowdown. Fortunately though: 1) when fma is run in QEMU (and not under a microbenchmark such as fp-bench), fma performance is still very good (much better than with soft-fp). 2) Compiling with -march=native gets rid of the problem. I've reproduced this with both gcc 5.4.0 and gcc 7.1.0. The *very* same fp-bench binary that performs very well for FMA on two machines (one AMD, one Intel, neither patched against Meltdown/Spectre) performs below soft-fp on another three machines (all Intel, all patched). Note: there are some checkpatch errors, but they are false positives. Perf numbers for fp-bench are in each commit log; numbers for several benchmarks are in the last patch's commit log. You can fetch this series from: https://github.com/cota/qemu/tree/hardfloat-v3 Thanks, Emilio --- configure | 2 + fpu/softfloat.c | 945 ++++++++++++++++++++++++++++++-- include/fpu/softfloat.h | 30 + target/tricore/fpu_helper.c | 9 +- tests/Makefile.include | 3 + tests/fp/.gitignore | 4 + tests/fp/Makefile | 36 ++ tests/fp/fp-bench.c | 528 ++++++++++++++++++ tests/fp/fp-test.c | 1183 ++++++++++++++++++++++++++++++++++++++++ tests/fp/muladd.fptest | 51 ++ 10 files changed, 2737 insertions(+), 54 deletions(-) create mode 100644 tests/fp/.gitignore create mode 100644 tests/fp/Makefile create mode 100644 tests/fp/fp-bench.c create mode 100644 tests/fp/fp-test.c create mode 100644 tests/fp/muladd.fptest