[Bug libgcc/108279] Improved speed for float128 routines

already5chosen at yahoo dot com via Gcc-bugs Fri, 10 Feb 2023 05:38:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279


--- Comment #24 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Michael_S from comment #22)
> (In reply to Michael_S from comment #8)
> > (In reply to Thomas Koenig from comment #6)
> > > And there will have to be a decision about 32-bit targets.
> > >
> > 
> > IMHO, 32-bit targets should be left in their current state.
> > People that use them probably do not care deeply about performance.
> > Technically, I can implement 32-bit targets in the same sources, by means of
> > few ifdefs and macros, but resulting source code will look much uglier than
> > how it looks today. Still, not to the same level of horror that you have in
> > matmul_r16.c, but certainly uglier than how I like it to look.
> > And I am not sure at all that my implementation of 32-bit targets would be
> > significantly faster than current soft float.
> 
> I explored this path (implementing 32-bit and 64-bit targets from the same
> source with few ifdefs) a little more:
> Now I am even more sure that it is not a way to go. gcc compiler does not
> generate good 32-bit code for this style of sources. This especially applies
> to i386, other supported 32-bit targets (RV32, SPARC32) are affected less.
> 

I can't explain to myself why I am doing it, but I did continue exploration of
32-bit targets. Well, not quite "targets", I don't have SPARC32 or RV32 to
play. So, I did continue exploration of i386.
As said above, using the same code for 32-bit and 64-bit does not produce
acceptable results. But pure 32-bit source did better than what I expected.
So when 2023-01-13 I wrote "And I am not sure at all that my implementation of
32-bit targets would be significantly faster than current soft float" I was
wrong. My implementation of 32-bit targets (i.e. i386) is significantly faster
than current soft float. Up to 3 times faster on Zen3, approximately 2 times
faster on various oldish Intel CPUs.
Today I put 32-bit sources into my github repository.

I am still convinced that improving performance of IEEE binary128 on 32-bit
targets is wastage of time, but since the time is already wasted may be results
can be used.

And may be, it can be used to bring IEEE binary128 to the Arm Cortex-M, where
it can be moderately useful in some situations.

[Bug libgcc/108279] Improved speed for float128 routines

Reply via email to