Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly

lhmouse Wed, 18 Jan 2017 21:38:37 -0800

> I see that you have replaced the x86 parts for fma and fmaf with C 
> code.  That seems like a good thing.  Is there some reason you can't do 
> that with the ARM versions too?
ARM has hardware FMA and software emulation is not optimal.


> Reducing the amount of platform-specific code also seems like a good thing.
The x87 80-bit floating point format is already platform-specific.

> There are a number of reasons not to use inline asm (for example 
> https://gcc.gnu.org/wiki/DontUseInlineAsm ).  Are you sure this is a 
> good idea?
I am not sure about the inline asm itself. The primary reason I did that
is because, if we have `fma.S` and `fma.c` in the same directory they will
compile to the same file `fma.o`, and `make` complains about that.

Inline asm is indeed hard to maintain and I am aware of it. Personally
I only write asm statements that contain very few instructions, simulating
builtin functions or intrinsics for use in C code.

> Yup, that's one of the downsides to using inline asm.
> 
> I'm no ARM expert, but I'm not sure about this ARM code for fmal:
> 
> +long double fmal(long double x, long double y, long double z){
> +  __asm__ (
> +    "fmacd %2, %0, %1 \n"
> +    "fcpyd %0, %2 \n"
> +    : "+&w"(z)
> +    : "w"(x), "w"(y)
> +  );
> 
> Doesn't fmacd modify %2?  That would be (y), which is listed as an input 
> parameter (and therefore is read-only).  What's more, I thought fmacd 
> was calculating "Fd + Fn * Fm" where the parameters were "fmacd Fd, Fn, 
> Fm".  Such being the case, I would have expected "fmacd %0, %1 %2"?  I 
> don't have a way to run this either, but this looks wrong.
Thanks for pointing it out. That is a mistake. I forgot to fix it after
copying it from the asm code. The `fma()` function was the correct one.

> Under the nit-picky heading:
> 
> +double fma(double x, double y, double z){
> +  __asm__ (
> +    "fmacd %0, %1, %2 \n"
> +    : "+&w"(z)
> +    : "w"(x), "w"(y)
> +  );
> 
> The \n is redundant.  And doesn't the + make the & redundant as well?
I just perfer to terminate every line of asm code with \n.

I believe the & is redundant not only because of the +, but also because
that there is only one instruction so nothing can be written before
the others are read.

> Lastly I gotta ask: Can we use __builtin_fmal?  Or is mingw-w64 the one 
> providing the implementations for these?
We have to ask a GCC developer for sure. According to my experience this
function is something guaranteed to be semantically equivalent to the one
without the __builtin_ prefix in the standard library. Sometimes
the compiler cannot assume all functions from the standard C library are
available and have the specified behavior e.g. when compiling the Linux
kernel. The `__builtin_fmal()` function is then considered to be
a standard FMA, suitable for constant folding. It may result in an inline
instruction where possible, but could also result in a call to the `fmal()`
external function, resulting in infinite recursion if used in `fmal()`.

------------------                               
Best regards,
lh_mouse
2017-01-19



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly

Reply via email to