Thomas Huth <th...@redhat.com> writes:
> On 2019-01-15 21:05, Emilio G. Cota wrote: >> On Tue, Jan 15, 2019 at 16:01:32 +0000, Alex Bennée wrote: >>> Ahh I should have mentioned we already have the technology for this ;-) >>> >>> If you build the fpu/next tree on a s390x you can then run: >>> >>> ./tests/fp/fp-bench f64_div >>> >>> with and without the CONFIG_128 path. To get an idea of the real world >>> impact you can compile a foreign binary and run it on a s390x system >>> with: >>> >>> $QEMU ./tests/fp/fp-bench f64_div -t host >>> >>> And that will give you the peak performance assuming your program is >>> doing nothing but f64_div operations. If the two QEMU's are basically in >>> the same ballpark then it doesn't make enough difference. That said: >> >> I think you mean here `tests/fp/fp-bench -o div -p double', otherwise >> you'll get the default op (-o add). > > I tried that now, too, and -o div -p double does not really seem to > exercise this function at all. How do you mean? It should do because by default it should be calling the softfloat implementations. > Here are my results (disclaimer: that system is likely not really usable > for benchmarks since it's CPUs are shared with other LPARs, but I ran > all the tests at least twice and got similar results): > > > With the DGLR inline assembly: > <snip> > time ./fp-bench -o div -p double > 204.98 MFlops <snip> > With the "#else" default 64-bit code: > <snip> > time ./fp-bench -o div -p double > 205.41 MFlops <snip> > With the new CONFIG_INT128 code: > <snip> > time ./fp-bench -o div -p double > 205.17 MFlops <snip> > > > ==> The new CONFIG_INT128 code is really worse than the 64-bit code, so > I don't think we should include this yet (unless we know a system where > the compiler can create optimized assembly code without libgcc here). I mean to me that looks like it is easily in the noise range and that the dglr instruction didn't actually beat the unrolled 64 bit code - which is just weird. -- Alex Bennée