Re: PPC64: Poor StrictMath performance due to non-optimized compilation

joe darcy Mon, 21 Nov 2016 16:43:30 -0800

Hello,


On 11/21/2016 4:34 PM, Gustavo Romero wrote:

Hi Chris,

On 17-11-2016 19:48, Chris Plummer wrote:

The fdlibm code relies on aliasing a two-element array of int with a double to 
do bit-level reads and writes of floating-point values. As I understand it, the 
C spec allows compilers to assume
values of different types don't overlap in memory. The compilation environment 
has to be configured in such a way that the C compiler disables code generation 
and optimization techniques that would
run afoul of these fdlibm coding practices.

This is the strict aliasing issue right? It's a long standing problem with 
fdlibm that kept getting worse as gcc got smarter. IIRC, compiling with 
-fno-strict-aliasing fixes it, but it's been more
than 12 years since I last dealt with fdlibm and compiler aliasing issues.

I've tested with -O3 and -fno-strict-aliasing as you suggested but it did not
fix the fp precision issue on PPC64.

After finding that -fno-expensive-optimizations solved the problem, we narrowed
down the problem to the FMA: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78386

That makes sense; an FMA will by its nature provide different resultsthan separate (unfused) multiple and add operations. While thepolynomials used in fdlibm would benefit performance-wise from implicitreplacement with FMA, such a replacement would violate the StrictMathcontract. Therefore, if FDLIBM is left in C sources, it must be compiledin such a way that FMA is *not* substituted for multiply and add.


Thanks,

-Joe

Re: PPC64: Poor StrictMath performance due to non-optimized compilation

Reply via email to