https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279
--- Comment #22 from Michael_S <already5chosen at yahoo dot com> --- (In reply to Michael_S from comment #8) > (In reply to Thomas Koenig from comment #6) > > And there will have to be a decision about 32-bit targets. > > > > IMHO, 32-bit targets should be left in their current state. > People that use them probably do not care deeply about performance. > Technically, I can implement 32-bit targets in the same sources, by means of > few ifdefs and macros, but resulting source code will look much uglier than > how it looks today. Still, not to the same level of horror that you have in > matmul_r16.c, but certainly uglier than how I like it to look. > And I am not sure at all that my implementation of 32-bit targets would be > significantly faster than current soft float. I explored this path (implementing 32-bit and 64-bit targets from the same source with few ifdefs) a little more: Now I am even more sure that it is not a way to go. gcc compiler does not generate good 32-bit code for this style of sources. This especially applies to i386, other supported 32-bit targets (RV32, SPARC32) are affected less. In the process I encountered a funny illogical pessimization by i386 code generator: https://godbolt.org/z/En6Tredox Routines foo32() and foo64() are semantically identical, but foo32() is written with 32-bit targets in mind while foo64() is the style of could that will likely be written if one wants to support 32 and 64 bits from the same source with #ifdef. The code, generated by gcc for foo32() is reasonable. Like in the source, we can see 8 multiplications. The code, generated by gcc for foo64() is anything but reasonable. Somehow, compiler invented 5 more multiplications for a total of 13 multiplications. May be, it deserves a separate bug report.