Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
> So you have decided that __builtins can't be used then? That's too bad. Yes it results in a call to `fma()` on x64. Can't test it on ARM though. > I know almost nothing about the guts of floating point, so I'm prepared > to defer to your judgement, but here's what I think: > > Let me propose an alternative for fma.c: > ... ... > In other words, remove all the platform specific code. This (greatly) > simplifies this file. You were already using fmal for x86. And it > doesn't lose anything for ARM, since both fma() and fmal() use the exact > same inline asm. Why have the exact same (hard to maintain) code in 2 > places? Keeping asm code in fmaf.c but not in fma.c seems style inconsistency. However the contrary is doable: In the case of ARM, call `fma()` in `fmal()`. > As for fmaf, what about: > ... ... > The case here is less compelling, but I assert that if fmal is > supported, it can always be used to calculate fmaf. If there is a > shorter/more efficient method (such as there is with ARM), it can be > added here. Fair enough. Updated. > As for fmal, I have a question about your code. Not the implementation, > but the design. Looking at https://en.wikipedia.org/wiki/Long_double, > it says "Microsoft Windows with Visual C++ also sets the processor in > double-precision mode by default." Since (it appears?) you aren't > following _controlfp_s, won't this give use a different answer than fmal > from msvcr120.dll? MSVC doesn't support 80-bit `long double` (it is 64 bits there) so the results can't equal unless it fits into 64 bits precisely. My FMA algorithm is basically splitting both operands into two 32-bit ones, multiplying them using elementary arithmetics then adding the four 64-bit results altogether: (a+b)(c+d) = ac+(bc+ad)+bd. So the precision of x87 indeed affects the result. I doubt whether it is necessary to save the x87 control word and set it to 64-bit precision before the calcuation and restore it thereafter. MinGW-w64 already sets it to 64-bit precision during CRT initialization, and if people set it lower they ain't going to need `fma()` either. An interesting look at https://msdn.microsoft.com/en-us/library/c9676k6h.aspx reminds me that _PC_64 isn't supported on x64. Sounds incredible, no? Does `_controlfp_s()` return an error if we try to set _PC_64 on 0x64? I have no idea. Nevertheless the precision flags can be set and restored using inline assembly - yet another dirty solution. > More nits: > > s/whecher/whether > s/#x86_Extended_Precision_Format/#x86_extended_precision_format Fixed. The bookmark to wikipedia was copied from my broswer half a year ago at least and it probably was modified. -- Best regards, lh_mouse 2017-01-20 -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
So you have decided that __builtins can't be used then? That's too bad. I know almost nothing about the guts of floating point, so I'm prepared to defer to your judgement, but here's what I think: Let me propose an alternative for fma.c: /** * This file has no copyright assigned and is placed in the Public Domain. * This file is part of the mingw-w64 runtime package. * No warranty is given; refer to the file DISCLAIMER.PD within this package. */ double fma(double x, double y, double z); long double fmal(long double x, long double y, long double z); double fma(double x, double y, double z){ return (double)fmal(x, y, z); } In other words, remove all the platform specific code. This (greatly) simplifies this file. You were already using fmal for x86. And it doesn't lose anything for ARM, since both fma() and fmal() use the exact same inline asm. Why have the exact same (hard to maintain) code in 2 places? As for fmaf, what about: /** * This file has no copyright assigned and is placed in the Public Domain. * This file is part of the mingw-w64 runtime package. * No warranty is given; refer to the file DISCLAIMER.PD within this package. */ float fmaf(float x, float y, float z); #if defined(_ARM_) || defined(__arm__) float fmaf(float x, float y, float z){ __asm__ ( "fmacs %0, %1, %2 \n" : "+t"(z) : "t"(x), "t"(y) ); return z; } #else long double fmal(long double x, long double y, long double z); float fmaf(float x, float y, float z){ return (float)fmal(x, y, z); } #endif The case here is less compelling, but I assert that if fmal is supported, it can always be used to calculate fmaf. If there is a shorter/more efficient method (such as there is with ARM), it can be added here. As for fmal, I have a question about your code. Not the implementation, but the design. Looking at https://en.wikipedia.org/wiki/Long_double, it says "Microsoft Windows with Visual C++ also sets the processor in double-precision mode by default." Since (it appears?) you aren't following _controlfp_s, won't this give use a different answer than fmal from msvcr120.dll? More nits: s/whecher/whether s/#x86_Extended_Precision_Format/#x86_extended_precision_format dw -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
New patch attached. This patch fixes ARM functions and adds a check in `fpu_fma()` for potential NaN or INF results. -- Best regards, lh_mouse 2017-01-19 From 3c55daec84dac190b9e3cb032371960e1acbc38f Mon Sep 17 00:00:00 2001 From: LH_MouseDate: Wed, 18 Jan 2017 19:35:43 +0800 Subject: [PATCH] mingw-w64-crt/math/fma{,f,l}.c: Implement fused multiply-add (FMA) funcitons for x86 families properly. mingw-w64-crt/Makefile.am: Likewise. mingw-w64-crt/math/fma{,f}.S: Merge into corresponding C files with the same names, respectively. --- mingw-w64-crt/Makefile.am | 4 +- mingw-w64-crt/math/fma.S | 42 - mingw-w64-crt/math/fma.c | 31 ++ mingw-w64-crt/math/fmaf.S | 43 -- mingw-w64-crt/math/fmaf.c | 31 ++ mingw-w64-crt/math/fmal.c | 146 -- 6 files changed, 205 insertions(+), 92 deletions(-) delete mode 100644 mingw-w64-crt/math/fma.S create mode 100644 mingw-w64-crt/math/fma.c delete mode 100644 mingw-w64-crt/math/fmaf.S create mode 100644 mingw-w64-crt/math/fmaf.c diff --git a/mingw-w64-crt/Makefile.am b/mingw-w64-crt/Makefile.am index 44360db..5eba234 100644 --- a/mingw-w64-crt/Makefile.am +++ b/mingw-w64-crt/Makefile.am @@ -227,7 +227,6 @@ src_libmingwex=\ \ math/_chgsignl.S math/ceil.Smath/ceilf.S math/ceill.S math/copysignl.S \ math/floor.S math/floorf.S math/floorl.S \ - math/fma.Smath/fmaf.S\ math/nearbyint.S math/nearbyintf.S math/nearbyintl.S \ math/trunc.S math/truncf.S \ math/cbrt.c \ @@ -235,7 +234,8 @@ src_libmingwex=\ math/coshf.c math/coshl.c math/erfl.c \ math/expf.c \ math/fabs.c math/fabsf.c math/fabsl.c math/fdim.c math/fdimf.c math/fdiml.c \ - math/fmal.c math/fmax.cmath/fmaxf.c math/fmaxl.c math/fmin.c math/fminf.c \ + math/fma.cmath/fmaf.cmath/fmal.c \ + math/fmax.c math/fmaxf.c math/fmaxl.c math/fmin.c math/fminf.c \ math/fminl.c math/fp_consts.c math/fp_constsf.c \ math/fp_constsl.c math/fpclassify.c math/fpclassifyf.c math/fpclassifyl.c math/frexpf.c\ math/hypotf.c math/hypot.c math/hypotl.c math/isnan.c math/isnanf.cmath/isnanl.c\ diff --git a/mingw-w64-crt/math/fma.S b/mingw-w64-crt/math/fma.S deleted file mode 100644 index 74becde..000 --- a/mingw-w64-crt/math/fma.S +++ /dev/null @@ -1,42 +0,0 @@ -/** - * This file has no copyright assigned and is placed in the Public Domain. - * This file is part of the mingw-w64 runtime package. - * No warranty is given; refer to the file DISCLAIMER.PD within this package. - */ -#include <_mingw_mac.h> - - .file "fma.S" - .text -#ifdef __x86_64__ - .align 8 -#else - .align 4 -#endif - .p2align 4,,15 - .globl __MINGW_USYMBOL(fma) - .def__MINGW_USYMBOL(fma); .scl2; .type 32; .endef -__MINGW_USYMBOL(fma): -#if defined(_AMD64_) || defined(__x86_64__) - subq$56, %rsp - movsd %xmm0,(%rsp) - movsd %xmm1,16(%rsp) - movsd %xmm2,32(%rsp) - fldl(%rsp) - fmull 16(%rsp) - fldl32(%rsp) - faddp - fstpl (%rsp) - movsd (%rsp),%xmm0 - addq$56, %rsp - ret -#elif defined(_ARM_) || defined(__arm__) - fmacd d2, d0, d1 - fcpyd d0, d2 - bx lr -#elif defined(_X86_) || defined(__i386__) - fldl4(%esp) - fmull 12(%esp) - fldl20(%esp) - faddp - ret -#endif diff --git a/mingw-w64-crt/math/fma.c b/mingw-w64-crt/math/fma.c new file mode 100644 index 000..00f100c --- /dev/null +++ b/mingw-w64-crt/math/fma.c @@ -0,0 +1,31 @@ +/** + * This file has no copyright assigned and is placed in the Public Domain. + * This file is part of the mingw-w64 runtime package. + * No warranty is given; refer to the file DISCLAIMER.PD within this package. + */ +double fma(double x, double y, double z); + +#if defined(_AMD64_) || defined(__x86_64__) || defined(_X86_) || defined(__i386__) + +long double fmal(long double x, long double y, long double z); + +double fma(double x, double y, double z){ + return (double)fmal(x, y, z); +} + +#elif defined(_ARM_) || defined(__arm__) + +double fma(double x, double y, double z){ + __asm__ ( +"fmacd %0, %1, %2 \n" +: "+w"(z) +: "w"(x), "w"(y) + ); + return z; +} + +#else + +#error This platform is not supported. + +#endif diff --git a/mingw-w64-crt/math/fmaf.S b/mingw-w64-crt/math/fmaf.S deleted file mode 100644 index 6bc7ef0..000 --- a/mingw-w64-crt/math/fmaf.S +++ /dev/null @@ -1,43 +0,0 @@ -/** - * This