Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly

lhmouse Thu, 19 Jan 2017 23:20:07 -0800

> So you have decided that __builtins can't be used then?  That's too bad.
Yes it results in a call to `fma()` on x64. Can't test it on ARM though.


> I know almost nothing about the guts of floating point, so I'm prepared
> to defer to your judgement, but here's what I think:
> 
> Let me propose an alternative for fma.c:
> ... ...
> In other words, remove all the platform specific code.  This (greatly)
> simplifies this file.  You were already using fmal for x86.  And it
> doesn't lose anything for ARM, since both fma() and fmal() use the exact
> same inline asm.  Why have the exact same (hard to maintain) code in 2
> places?
Keeping asm code in fmaf.c but not in fma.c seems style inconsistency.
However the contrary is doable: In the case of ARM, call `fma()` in `fmal()`.

> As for fmaf, what about:
> ... ...
> The case here is less compelling, but I assert that if fmal is
> supported, it can always be used to calculate fmaf.  If there is a
> shorter/more efficient method (such as there is with ARM), it can be
> added here.
Fair enough. Updated.

> As for fmal, I have a question about your code.  Not the implementation,
> but the design.  Looking at https://en.wikipedia.org/wiki/Long_double,
> it says "Microsoft Windows with Visual C++ also sets the processor in
> double-precision mode by default."  Since (it appears?) you aren't
> following _controlfp_s, won't this give use a different answer than fmal
> from msvcr120.dll?
MSVC doesn't support 80-bit `long double` (it is 64 bits there) so
the results can't equal unless it fits into 64 bits precisely.
My FMA algorithm is basically splitting both operands into two 32-bit ones,
multiplying them using elementary arithmetics then adding the four 64-bit
results altogether: (a+b)(c+d) = ac+(bc+ad)+bd. So the precision of x87
indeed affects the result.
I doubt whether it is necessary to save the x87 control word and set it to
64-bit precision before the calcuation and restore it thereafter. MinGW-w64
already sets it to 64-bit precision during CRT initialization, and if people
set it lower they ain't going to need `fma()` either.

An interesting look at https://msdn.microsoft.com/en-us/library/c9676k6h.aspx
reminds me that _PC_64 isn't supported on x64. Sounds incredible, no? Does
`_controlfp_s()` return an error if we try to set _PC_64 on 0x64? I have no
idea. Nevertheless the precision flags can be set and restored using inline
assembly - yet another dirty solution.

> More nits:
>
> s/whecher/whether
> s/#x86_Extended_Precision_Format/#x86_extended_precision_format
Fixed. The bookmark to wikipedia was copied from my broswer half a year ago
at least and it probably was modified.

------------------                               
Best regards,
lh_mouse
2017-01-20

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] Implement fused multiply-add (FMA) funcitons for x86 families properly

Reply via email to