https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279
--- Comment #8 from Michael_S <already5chosen at yahoo dot com> --- (In reply to Thomas Koenig from comment #6) > (In reply to Michael_S from comment #5) > > Hi Thomas > > Are you in or out? > > Depends a bit on what exactly you want to do, and if there is > a chance that what you want to do will be incorporated into gcc. > What about incorporation in Fortran? What about incorporation in C under fast-math ? > If you want to replace the soft-float routines, you will have to > replace them with the full functionality. > Full functionality including Inexact Exception that practically nobody uses? Sounds wasteful of perfectly good CPU cycles. Also, I am not so sure that Inexact Exception is fully supported in existing soft-float library. Almost-full functionality with support for non-default rounding modes, but without Inexact Exception? I actually implemented it and did few measurements. You can find the results in the directory /reports in my repo. Summary: architecture-neutral method cause very serious slowdown. Less so on slower machines, massive 2.5x on the fastest machine (Zen3 under Linux under WSL). AMD64-specific method causes smaller slowdown, esp. on relatively old Intel cores on Windows (I have no modern Intel cores available for testing). But Zen3/Linux still suffer 1.45x slowdown. Again, a big wastage of perfectly good CPU cycles. Also, what about other architectures? Should they suffer an "architecture-neutral" slowdown? Even if there are faster methods on other architecture, these methods should be found by somebody and tested by somebody. This sort of work is time-consuming. And for what? Also I measured an impact of implementing non-default rounding through additional function parameter. An impact is very small, 0 to 5%. You said on comp.arch that at least for Fortran it could work. What else is missing for "full functionality"? Surely there are other things that I forgot. May be, additional exceptions apart from Invalid Operand (that hopefully already works) and apart from Inexact that I find stupid? I don't think that they are hard to implement or expensive in terms of speed. Just a bit of work and more than a bit of testing. > And there will have to be a decision about 32-bit targets. > IMHO, 32-bit targets should be left in their current state. People that use them probably do not care deeply about performance. Technically, I can implement 32-bit targets in the same sources, by means of few ifdefs and macros, but resulting source code will look much uglier than how it looks today. Still, not to the same level of horror that you have in matmul_r16.c, but certainly uglier than how I like it to look. And I am not sure at all that my implementation of 32-bit targets would be significantly faster than current soft float. In short, it does not sound as a good ROI. BTW, do you know why current soft float supports so few 32-bit targets? Most likely somebody felt just like me about it - it's not too hard to support more 32-bit targets, but it's not a good ROI.