Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
> I see that you have replaced the x86 parts for fma and fmaf with C > code. That seems like a good thing. Is there some reason you can't do > that with the ARM versions too? ARM has hardware FMA and software emulation is not optimal. > Reducing the amount of platform-specific code also seems like a good thing. The x87 80-bit floating point format is already platform-specific. > There are a number of reasons not to use inline asm (for example > https://gcc.gnu.org/wiki/DontUseInlineAsm ). Are you sure this is a > good idea? I am not sure about the inline asm itself. The primary reason I did that is because, if we have `fma.S` and `fma.c` in the same directory they will compile to the same file `fma.o`, and `make` complains about that. Inline asm is indeed hard to maintain and I am aware of it. Personally I only write asm statements that contain very few instructions, simulating builtin functions or intrinsics for use in C code. > Yup, that's one of the downsides to using inline asm. > > I'm no ARM expert, but I'm not sure about this ARM code for fmal: > > +long double fmal(long double x, long double y, long double z){ > + __asm__ ( > +"fmacd %2, %0, %1 \n" > +"fcpyd %0, %2 \n" > +: "+"(z) > +: "w"(x), "w"(y) > + ); > > Doesn't fmacd modify %2? That would be (y), which is listed as an input > parameter (and therefore is read-only). What's more, I thought fmacd > was calculating "Fd + Fn * Fm" where the parameters were "fmacd Fd, Fn, > Fm". Such being the case, I would have expected "fmacd %0, %1 %2"? I > don't have a way to run this either, but this looks wrong. Thanks for pointing it out. That is a mistake. I forgot to fix it after copying it from the asm code. The `fma()` function was the correct one. > Under the nit-picky heading: > > +double fma(double x, double y, double z){ > + __asm__ ( > +"fmacd %0, %1, %2 \n" > +: "+"(z) > +: "w"(x), "w"(y) > + ); > > The \n is redundant. And doesn't the + make the & redundant as well? I just perfer to terminate every line of asm code with \n. I believe the & is redundant not only because of the +, but also because that there is only one instruction so nothing can be written before the others are read. > Lastly I gotta ask: Can we use __builtin_fmal? Or is mingw-w64 the one > providing the implementations for these? We have to ask a GCC developer for sure. According to my experience this function is something guaranteed to be semantically equivalent to the one without the __builtin_ prefix in the standard library. Sometimes the compiler cannot assume all functions from the standard C library are available and have the specified behavior e.g. when compiling the Linux kernel. The `__builtin_fmal()` function is then considered to be a standard FMA, suitable for constant folding. It may result in an inline instruction where possible, but could also result in a call to the `fmal()` external function, resulting in infinite recursion if used in `fmal()`. -- Best regards, lh_mouse 2017-01-19 -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
On 1/18/2017 5:14 AM, lhmouse wrote: > Patch is attached. I see that you have replaced the x86 parts for fma and fmaf with C code. That seems like a good thing. Is there some reason you can't do that with the ARM versions too? Reducing the amount of platform-specific code also seems like a good thing. > This patch removes assembly files that implement FMA on ARM and merges > them into the corresponding C files with the same name using inline assembly. Umm. Hmm. There are a number of reasons not to use inline asm (for example https://gcc.gnu.org/wiki/DontUseInlineAsm ). Are you sure this is a good idea? > I don't have any knowledge about ARM assembly. Those functions for ARM > were created using my x86 assembly knowledge and the actual instructions > are copy-n-paste'd from old .S files. I don't have an ARM compiler to test > those functions. Please fix them should they be broken. Yup, that's one of the downsides to using inline asm. I'm no ARM expert, but I'm not sure about this ARM code for fmal: +long double fmal(long double x, long double y, long double z){ + __asm__ ( +"fmacd %2, %0, %1 \n" +"fcpyd %0, %2 \n" +: "+"(z) +: "w"(x), "w"(y) + ); Doesn't fmacd modify %2? That would be (y), which is listed as an input parameter (and therefore is read-only). What's more, I thought fmacd was calculating "Fd + Fn * Fm" where the parameters were "fmacd Fd, Fn, Fm". Such being the case, I would have expected "fmacd %0, %1 %2"? I don't have a way to run this either, but this looks wrong. Under the nit-picky heading: +double fma(double x, double y, double z){ + __asm__ ( +"fmacd %0, %1, %2 \n" +: "+"(z) +: "w"(x), "w"(y) + ); The \n is redundant. And doesn't the + make the & redundant as well? +float fmaf(float x, float y, float z){ + __asm__ ( +"fmacs %0, %1, %2 \n" +: "+"(z) +: "t"(x), "t"(y) The \n is redundant. And doesn't the + make the & redundant as well? Lastly I gotta ask: Can we use __builtin_fmal? Or is mingw-w64 the one providing the implementations for these? dw -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] [PATCH v11] crt: Recognize cygwin ptys in isatty
On Sat, Jan 07, 2017 at 10:52:40AM +0500, Mihail Konev wrote: > Signed-off-by: Mihail Konev> Moved-from: https://github.com/Alexpux/mingw-w64/pull/3 > Reference: https://cygwin.com/ml/cygwin-developers/2016-11/msg2.html > --- > v11: const the exported isatty > > mingw-w64-crt/Makefile.am | 1 + > mingw-w64-crt/def-include/msvcrt-common.def.in | 2 +- > mingw-w64-crt/lib64/moldname-msvcrt.def| 2 +- > mingw-w64-crt/stdio/isatty.c | 111 > + > mingw-w64-headers/crt/io.h | 1 + > 5 files changed, 115 insertions(+), 2 deletions(-) > create mode 100644 mingw-w64-crt/stdio/isatty.c > Maybe making Cygwin present Win32 applications with a Windows Console (which mirrors pty stdout and drives pty stdin) would be more correct (as it allows smoother interoperability)? -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] [PATCH] libuuid.a: Added a few missing GUIDs.
Jacek, please go ahead and apply. Thanks, Kai 2017-01-18 18:04 GMT+01:00 Jacek Caban: > Please review. > > --- > mingw-w64-crt/libsrc/uuid.c | 4 > mingw-w64-headers/include/cguid.h | 1 + > 2 files changed, 5 insertions(+) > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Mingw-w64-public mailing list > Mingw-w64-public@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] [PATCH] comutil.h: Include comdef.h.
Hi Jacek, patch is ok. Please apply. Thanks, Kai 2017-01-18 18:06 GMT+01:00 Jacek Caban: > This avoids problems with linking applications that include comutil.h, > but not comdef.h. > > Please review. > > --- > mingw-w64-headers/include/comutil.h | 4 > 1 file changed, 4 insertions(+) > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Mingw-w64-public mailing list > Mingw-w64-public@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
[Mingw-w64-public] [PATCH] libuuid.a: Added a few missing GUIDs.
Please review. --- mingw-w64-crt/libsrc/uuid.c | 4 mingw-w64-headers/include/cguid.h | 1 + 2 files changed, 5 insertions(+) diff --git a/mingw-w64-crt/libsrc/uuid.c b/mingw-w64-crt/libsrc/uuid.c index 17e5e07..aaeb4e1 100644 --- a/mingw-w64-crt/libsrc/uuid.c +++ b/mingw-w64-crt/libsrc/uuid.c @@ -295,3 +295,7 @@ DEFINE_GUID(OLE_DATAPATH_WMF,0x2de03,0,0,0xc0,0,0,0,0,0,0,0x46); DEFINE_GUID(OLE_DATAPATH_XBM,0x2de08,0,0,0xc0,0,0,0,0,0,0,0x46); DEFINE_GUID(SID_SContainerDispatch,0xb722be00,0x4e68,0x101b,0xa2,0xbc,0,0xaa,0,0x40,0x47,0x70); DEFINE_GUID(SID_SDataPathBrowser,0xfc4801a5,0x2ba9,0x11cf,0xa2,0x29,0,0xaa,0,0x3d,0x73,0x52); +DEFINE_GUID(CLSID_GlobalOptions,0x034b,0x,0x,0xc0,0x00,0x00,0x00,0x00,0x00,0x00,0x46); +DEFINE_GUID(IID_ICallFrameEvents,0xfd5e0843,0xfc91,0x11d0,0x97,0xd7,0x00,0xc0,0x4f,0xb9,0x61,0x8a); +DEFINE_GUID(IID_ICallFrameWalker,0x08b23919,0x392d,0x11d2,0xb8,0xa4,0x00,0xc0,0x4f,0xb9,0x61,0x8a); +DEFINE_GUID(IID_ICallInterceptor,0x60c7ca75,0x896d,0x11d2,0xb8,0xb6,0x00,0xc0,0x4f,0xb9,0x61,0x8a); diff --git a/mingw-w64-headers/include/cguid.h b/mingw-w64-headers/include/cguid.h index 99b93b0..497c4c5 100644 --- a/mingw-w64-headers/include/cguid.h +++ b/mingw-w64-headers/include/cguid.h @@ -46,6 +46,7 @@ extern "C" { extern const CLSID CLSID_StaticDib; extern const CLSID CID_CDfsVolume; extern const CLSID CLSID_DCOMAccessControl; + extern const CLSID CLSID_GlobalOptions; extern const CLSID CLSID_StdGlobalInterfaceTable; extern const CLSID CLSID_ComBinding; extern const CLSID CLSID_StdEvent; -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
[Mingw-w64-public] [PATCH] comutil.h: Include comdef.h.
This avoids problems with linking applications that include comutil.h, but not comdef.h. Please review. --- mingw-w64-headers/include/comutil.h | 4 1 file changed, 4 insertions(+) diff --git a/mingw-w64-headers/include/comutil.h b/mingw-w64-headers/include/comutil.h index 57f9ea3..f6d7adc 100644 --- a/mingw-w64-headers/include/comutil.h +++ b/mingw-w64-headers/include/comutil.h @@ -1211,6 +1211,10 @@ extern _variant_t vtMissing; #pragma pop_macro("new") +/* We use _com_issue_error here, but we only provide its inline version in comdef.h, + * so we need to make sure that it's included as well. */ +#include + #endif /* __cplusplus */ #endif -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
The correctness of fma() function can be verified using the following program: --- #include #include volatile double x = 0x1.3p52; volatile double y = 0x1.5p52; volatile double z = -0x1.8p104; int main(){ printf("x * y + z= %f\n", x * y + z); printf("fma(x, y, z) = %f\n", fma(x, y, z)); } --- A naive multiply-then-add loses some LSBs during the multiplication and yields zero when the MSBs are complemented by a negative number. A true FMA function yields 15 in this example. -- Best regards, lh_mouse 2017-01-18 -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
[Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly
Patch is attached. This patch removes assembly files that implement FMA on ARM and merges them into the corresponding C files with the same name using inline assembly. A re-generation of Makefile.in is required. I don't have any knowledge about ARM assembly. Those functions for ARM were created using my x86 assembly knowledge and the actual instructions are copy-n-paste'd from old .S files. I don't have an ARM compiler to test those functions. Please fix them should they be broken. -- Best regards, lh_mouse 2017-01-18 From 0534577644f12e94cc408d37083277f133d1ca47 Mon Sep 17 00:00:00 2001 From: LH_MouseDate: Wed, 18 Jan 2017 19:35:43 +0800 Subject: [PATCH] mingw-w64-crt/math/fma{,f,l}.c: Implement fused multiply-add (FMA) funcitons for x86 families properly. mingw-w64-crt/Makefile.am: Likewise. mingw-w64-crt/math/fma{,f}.S: Merge into corresponding C files with the same names, respectively. --- mingw-w64-crt/Makefile.am | 4 +- mingw-w64-crt/math/fma.S | 42 --- mingw-w64-crt/math/fma.c | 31 +++ mingw-w64-crt/math/fmaf.S | 43 --- mingw-w64-crt/math/fmaf.c | 31 +++ mingw-w64-crt/math/fmal.c | 135 -- 6 files changed, 194 insertions(+), 92 deletions(-) delete mode 100644 mingw-w64-crt/math/fma.S create mode 100644 mingw-w64-crt/math/fma.c delete mode 100644 mingw-w64-crt/math/fmaf.S create mode 100644 mingw-w64-crt/math/fmaf.c diff --git a/mingw-w64-crt/Makefile.am b/mingw-w64-crt/Makefile.am index 44360db..5eba234 100644 --- a/mingw-w64-crt/Makefile.am +++ b/mingw-w64-crt/Makefile.am @@ -227,7 +227,6 @@ src_libmingwex=\ \ math/_chgsignl.S math/ceil.Smath/ceilf.S math/ceill.S math/copysignl.S \ math/floor.S math/floorf.S math/floorl.S \ - math/fma.Smath/fmaf.S\ math/nearbyint.S math/nearbyintf.S math/nearbyintl.S \ math/trunc.S math/truncf.S \ math/cbrt.c \ @@ -235,7 +234,8 @@ src_libmingwex=\ math/coshf.c math/coshl.c math/erfl.c \ math/expf.c \ math/fabs.c math/fabsf.c math/fabsl.c math/fdim.c math/fdimf.c math/fdiml.c \ - math/fmal.c math/fmax.cmath/fmaxf.c math/fmaxl.c math/fmin.c math/fminf.c \ + math/fma.cmath/fmaf.cmath/fmal.c \ + math/fmax.c math/fmaxf.c math/fmaxl.c math/fmin.c math/fminf.c \ math/fminl.c math/fp_consts.c math/fp_constsf.c \ math/fp_constsl.c math/fpclassify.c math/fpclassifyf.c math/fpclassifyl.c math/frexpf.c\ math/hypotf.c math/hypot.c math/hypotl.c math/isnan.c math/isnanf.cmath/isnanl.c\ diff --git a/mingw-w64-crt/math/fma.S b/mingw-w64-crt/math/fma.S deleted file mode 100644 index 74becde..000 --- a/mingw-w64-crt/math/fma.S +++ /dev/null @@ -1,42 +0,0 @@ -/** - * This file has no copyright assigned and is placed in the Public Domain. - * This file is part of the mingw-w64 runtime package. - * No warranty is given; refer to the file DISCLAIMER.PD within this package. - */ -#include <_mingw_mac.h> - - .file "fma.S" - .text -#ifdef __x86_64__ - .align 8 -#else - .align 4 -#endif - .p2align 4,,15 - .globl __MINGW_USYMBOL(fma) - .def__MINGW_USYMBOL(fma); .scl2; .type 32; .endef -__MINGW_USYMBOL(fma): -#if defined(_AMD64_) || defined(__x86_64__) - subq$56, %rsp - movsd %xmm0,(%rsp) - movsd %xmm1,16(%rsp) - movsd %xmm2,32(%rsp) - fldl(%rsp) - fmull 16(%rsp) - fldl32(%rsp) - faddp - fstpl (%rsp) - movsd (%rsp),%xmm0 - addq$56, %rsp - ret -#elif defined(_ARM_) || defined(__arm__) - fmacd d2, d0, d1 - fcpyd d0, d2 - bx lr -#elif defined(_X86_) || defined(__i386__) - fldl4(%esp) - fmull 12(%esp) - fldl20(%esp) - faddp - ret -#endif diff --git a/mingw-w64-crt/math/fma.c b/mingw-w64-crt/math/fma.c new file mode 100644 index 000..98249aa --- /dev/null +++ b/mingw-w64-crt/math/fma.c @@ -0,0 +1,31 @@ +/** + * This file has no copyright assigned and is placed in the Public Domain. + * This file is part of the mingw-w64 runtime package. + * No warranty is given; refer to the file DISCLAIMER.PD within this package. + */ +double fma(double x, double y, double z); + +#if defined(_AMD64_) || defined(__x86_64__) || defined(_X86_) || defined(__i386__) + +long double fmal(long double x, long double y, long double z); + +double fma(double x, double y, double z){ + return (double)fmal(x, y, z); +} + +#elif defined(_ARM_) || defined(__arm__) + +double fma(double x, double y, double