[Bug libgcc/108279] Improved speed for float128 routines

already5chosen at yahoo dot com via Gcc-bugs Thu, 12 Jan 2023 17:29:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279


--- Comment #8 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Thomas Koenig from comment #6)
> (In reply to Michael_S from comment #5)
> > Hi Thomas
> > Are you in or out?
> 
> Depends a bit on what exactly you want to do, and if there is
> a chance that what you want to do will be incorporated into gcc.
> 

What about incorporation in Fortran?
What about incorporation in C under fast-math ?

> If you want to replace the soft-float routines, you will have to
> replace them with the full functionality.
> 

Full functionality including Inexact Exception that practically nobody uses?
Sounds wasteful of perfectly good CPU cycles.
Also, I am not so sure that Inexact Exception is fully supported in  existing
soft-float library.

Almost-full functionality with support for non-default rounding modes, but
without Inexact Exception?
I actually implemented it and did few measurements. You can find the results in
the directory /reports in my repo.
Summary: architecture-neutral method cause very serious slowdown. Less so on
slower machines, massive 2.5x on the fastest machine (Zen3 under Linux under
WSL).
AMD64-specific method causes smaller slowdown, esp. on relatively old Intel
cores on Windows (I have no modern Intel cores available for testing). But
Zen3/Linux still suffer 1.45x slowdown. Again, a big wastage of perfectly good
CPU cycles.
Also, what about other architectures? Should they suffer an
"architecture-neutral" slowdown? Even if there are faster methods on other
architecture, these methods should be found by somebody and tested by somebody.
This sort of work is time-consuming. And for what?

Also I measured an impact of implementing non-default rounding through
additional function parameter. An impact is very small, 0  to 5%.
You said on comp.arch that at least for Fortran it could work.

What else is missing for "full functionality"? 
Surely there are other things that I forgot. May be, additional exceptions
apart from Invalid Operand (that hopefully already works) and apart from
Inexact that I find stupid? I don't think that they are hard to implement or
expensive in terms of speed. Just a bit of work and more than a bit of testing.

> And there will have to be a decision about 32-bit targets.
>

IMHO, 32-bit targets should be left in their current state.
People that use them probably do not care deeply about performance.
Technically, I can implement 32-bit targets in the same sources, by means of
few ifdefs and macros, but resulting source code will look much uglier than how
it looks today. Still, not to the same level of horror that you have in
matmul_r16.c, but certainly uglier than how I like it to look.
And I am not sure at all that my implementation of 32-bit targets would be
significantly faster than current soft float. 
In short, it does not sound as a good ROI.
BTW, do you know why current soft float supports so few 32-bit targets?
Most likely somebody felt just like me about it - it's not too hard to support
more 32-bit targets, but it's not a good ROI.

[Bug libgcc/108279] Improved speed for float128 routines

Reply via email to