> I see that you have replaced the x86 parts for fma and fmaf with C > code. That seems like a good thing. Is there some reason you can't do > that with the ARM versions too? ARM has hardware FMA and software emulation is not optimal.
> Reducing the amount of platform-specific code also seems like a good thing. The x87 80-bit floating point format is already platform-specific. > There are a number of reasons not to use inline asm (for example > https://gcc.gnu.org/wiki/DontUseInlineAsm ). Are you sure this is a > good idea? I am not sure about the inline asm itself. The primary reason I did that is because, if we have `fma.S` and `fma.c` in the same directory they will compile to the same file `fma.o`, and `make` complains about that. Inline asm is indeed hard to maintain and I am aware of it. Personally I only write asm statements that contain very few instructions, simulating builtin functions or intrinsics for use in C code. > Yup, that's one of the downsides to using inline asm. > > I'm no ARM expert, but I'm not sure about this ARM code for fmal: > > +long double fmal(long double x, long double y, long double z){ > + __asm__ ( > + "fmacd %2, %0, %1 \n" > + "fcpyd %0, %2 \n" > + : "+&w"(z) > + : "w"(x), "w"(y) > + ); > > Doesn't fmacd modify %2? That would be (y), which is listed as an input > parameter (and therefore is read-only). What's more, I thought fmacd > was calculating "Fd + Fn * Fm" where the parameters were "fmacd Fd, Fn, > Fm". Such being the case, I would have expected "fmacd %0, %1 %2"? I > don't have a way to run this either, but this looks wrong. Thanks for pointing it out. That is a mistake. I forgot to fix it after copying it from the asm code. The `fma()` function was the correct one. > Under the nit-picky heading: > > +double fma(double x, double y, double z){ > + __asm__ ( > + "fmacd %0, %1, %2 \n" > + : "+&w"(z) > + : "w"(x), "w"(y) > + ); > > The \n is redundant. And doesn't the + make the & redundant as well? I just perfer to terminate every line of asm code with \n. I believe the & is redundant not only because of the +, but also because that there is only one instruction so nothing can be written before the others are read. > Lastly I gotta ask: Can we use __builtin_fmal? Or is mingw-w64 the one > providing the implementations for these? We have to ask a GCC developer for sure. According to my experience this function is something guaranteed to be semantically equivalent to the one without the __builtin_ prefix in the standard library. Sometimes the compiler cannot assume all functions from the standard C library are available and have the specified behavior e.g. when compiling the Linux kernel. The `__builtin_fmal()` function is then considered to be a standard FMA, suitable for constant folding. It may result in an inline instruction where possible, but could also result in a call to the `fmal()` external function, resulting in infinite recursion if used in `fmal()`. ------------------ Best regards, lh_mouse 2017-01-19 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public