Thomas Huth <th...@redhat.com> writes:

> On 2019-01-15 21:05, Emilio G. Cota wrote:
>> On Tue, Jan 15, 2019 at 16:01:32 +0000, Alex Bennée wrote:
>>> Ahh I should have mentioned we already have the technology for this ;-)
>>>
>>> If you build the fpu/next tree on a s390x you can then run:
>>>
>>>   ./tests/fp/fp-bench f64_div
>>>
>>> with and without the CONFIG_128 path. To get an idea of the real world
>>> impact you can compile a foreign binary and run it on a s390x system
>>> with:
>>>
>>>   $QEMU ./tests/fp/fp-bench f64_div -t host
>>>
>>> And that will give you the peak performance assuming your program is
>>> doing nothing but f64_div operations. If the two QEMU's are basically in
>>> the same ballpark then it doesn't make enough difference. That said:
>>
>> I think you mean here `tests/fp/fp-bench -o div -p double', otherwise
>> you'll get the default op (-o add).
>
> I tried that now, too, and -o div -p double does not really seem to
> exercise this function at all.

How do you mean? It should do because by default it should be calling
the softfloat implementations.

> Here are my results (disclaimer: that system is likely not really usable
> for benchmarks since it's CPUs are shared with other LPARs, but I ran
> all the tests at least twice and got similar results):
>
>
> With the DGLR inline assembly:
>
<snip>
>  time ./fp-bench -o div -p double
>  204.98 MFlops
<snip>
> With the "#else" default 64-bit code:
>
<snip>
>  time ./fp-bench -o div -p double
>  205.41 MFlops
<snip>
> With the new CONFIG_INT128 code:
>
<snip>
>  time ./fp-bench -o div -p double
>  205.17 MFlops
<snip>
>
>
> ==> The new CONFIG_INT128 code is really worse than the 64-bit code, so
> I don't think we should include this yet (unless we know a system where
> the compiler can create optimized assembly code without libgcc here).

I mean to me that looks like it is easily in the noise range and that
the dglr instruction didn't actually beat the unrolled 64 bit code -
which is just weird.

--
Alex Bennée

Reply via email to