https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279
--- Comment #23 from Michael_S <already5chosen at yahoo dot com> --- (In reply to Jakub Jelinek from comment #19) > So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86 > sfp-machine.h > #define FP_INIT_ROUNDMODE \ > do { \ > __asm__ __volatile__ ("%vstmxcsr\t%0" : "=m" (_fcw)); \ > } while (0) > #endif > to do that only if not round-to-nearest. > E.g. the fast-float library uses since last November: > static volatile float fmin = __FLT_MIN__; > float fmini = fmin; // we copy it so that it gets loaded at most once. > return (fmini + 1.0f == 1.0f - fmini); > as a quick test whether round-to-nearest is in effect or some other rounding > mode. > Most likely if this is done with -frounding-math it wouldn't even need the > volatile stuff. Of course, if it isn't round-to-nearest, we would need to > do the expensive {,v}stmxcsr. I agree with Wilco. This trick is problematic due to effect on inexact flag. Also, I don't quite understand how you got to setting rounding mode. I don't need to set rounding mode, I just need to read a current rounding mode. Doing it in portable way, i.e. by fegetround(), is slow mostly due to various overheads. Doing it in non-portable way on x86-64 (by _MM_GET_ROUNDING_MODE()) is not slow on Intel, but still pretty slow on AMD Zen3, although even on Zen3 it is much faster than fegetround(). Results of measurements are here: https://github.com/already5chosen/extfloat/blob/master/binary128/reports/rm-impact.txt Anyway, I'd very much prefer a portable solution over multitude of ifdefs. It is a pity that gcc doesn't implement FLT_ROUNDS like other compilers. But, then again, it is a pity that gcc doesn't implement few other things implemented by other compilers that could make life of developers of portable multiprecision routines in general and of soft float in particular so much easier.