[Bug libgcc/108279] Improved speed for float128 routines

already5chosen at yahoo dot com via Gcc-bugs Wed, 18 Jan 2023 15:31:50 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279


--- Comment #23 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Jakub Jelinek from comment #19)
> So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86
> sfp-machine.h
> #define FP_INIT_ROUNDMODE                                       \
>   do {                                                          \
>     __asm__ __volatile__ ("%vstmxcsr\t%0" : "=m" (_fcw));       \
>   } while (0)
> #endif
> to do that only if not round-to-nearest.
> E.g. the fast-float library uses since last November:
>   static volatile float fmin = __FLT_MIN__;
>   float fmini = fmin; // we copy it so that it gets loaded at most once.
>   return (fmini + 1.0f == 1.0f - fmini);
> as a quick test whether round-to-nearest is in effect or some other rounding
> mode.
> Most likely if this is done with -frounding-math it wouldn't even need the
> volatile stuff.  Of course, if it isn't round-to-nearest, we would need to
> do the expensive {,v}stmxcsr.

I agree with Wilco. This trick is problematic due to effect on inexact flag.
Also, I don't quite understand how you got to setting rounding mode.
I don't need to set rounding mode, I just need to read a current rounding mode.

Doing it in portable way, i.e. by fegetround(), is slow mostly due to various
overheads.
Doing it in non-portable way on x86-64 (by _MM_GET_ROUNDING_MODE()) is not slow
on Intel, but still pretty slow on AMD Zen3, although even on Zen3 it is much
faster than fegetround().
Results of measurements are here:
https://github.com/already5chosen/extfloat/blob/master/binary128/reports/rm-impact.txt
Anyway, I'd very much prefer a portable solution over multitude of ifdefs.
It is a pity that gcc doesn't implement FLT_ROUNDS like other compilers.

But, then again, it is a pity that gcc doesn't implement few other things
implemented by other compilers that could make life of developers of portable
multiprecision routines in general and of soft float in particular so much
easier.

[Bug libgcc/108279] Improved speed for float128 routines

Reply via email to