Re: PPC64: Poor StrictMath performance due to non-optimized compilation

Chris Plummer Mon, 21 Nov 2016 17:34:35 -0800

On 11/21/16 4:27 PM, Gustavo Romero wrote:

Hi Joe,


On 17-11-2016 19:33, joe darcy wrote:

Currently, optimization for building fdlibm is disabled, except for the
"solaris" OS target [1].

The reason for that is because historically the Solaris compilers have had 
sufficient discipline and control regarding floating-point semantics and 
compiler optimizations to still implement the
Java-mandated results when optimization was enabled. The gcc family of 
compilers, for example, has lacked such discipline.

oh, I see. Thanks for clarifying that. I was exactly wondering why fdlibm
optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 only, does
not affect the precision, even if setting -O3 does not improve the performance
as much as on PPC64.

The fdlibm code relies on aliasing a two-element array of int with a double to 
do bit-level reads and writes of floating-point values. As I understand it, the 
C spec allows compilers to assume values
of different types don't overlap in memory. The compilation environment has to 
be configured in such a way that the C compiler disables code generation and 
optimization techniques that would run afoul
of these fdlibm coding practices.

On discussing with the Power toolchain folks we narrowed down the issue on PPC64
to the FMA. -fno-strict-aliasing has no effect and when used with an aggressive
optimization does not solve the issue on precision. Thus -ffp-contract=off is
the best options we have by now to optimize the fdlibm on PPC64.

Ah! I should have thought of this. I dealt with with fdlibm FMA issueson ppc about 15 years ago. At the time -mno-fused-madd was thesolution. I don't think -ffp-contract=off existed back then.


Chris

Methods in the Math class, such as pow, are often intrinsified and use a 
different algorithm so a straight performance comparison may not be as fair or 
meaningful in those cases.

I agree. It's just that the issue on StrictMath methods was first noted due to
that huge gap (Math vs StrictMath) on PPC64, which is not prominent on x64.

Depending on how Math.{sin, cos} is implemented on PPC64, compiling the fdlibm 
sin/cos with more aggressive optimizations should not be expected to close the 
performance gap. In particular, if
Math.{sin, cos} is an intrinsic on PPC64 (I haven't checked the sources) that 
used platform-specific feature (say fused multiply add instructions) then just 
compiling fdlibm more aggressively wouldn't
necessarily make up that gap.

In our case (PPC64) it does close the gap. Non-optimized code will suffer a lot,
for instance, from load-hit-store issues. Contrary to what happens on PPC64, the
gap on x64 seems to be quite small as you said.

To allow cross-platform and cross-release reproducibility, StrictMath is 
specified to use the particular fdlibm algorithms, which precludes using better 
algorithms developed more recently. If we were
to start with a clean slate today, to get such reproducibility we would specify 
correctly-rounded behavior of all those methods, but such an approach was much 
less tractable technical 20+ years ago
without benefit of the research that was been done in the interim, such as the 
work of Prof. Muller and associates: 
https://lipforge.ens-lyon.fr/projects/crlibm/.

Accumulating the the results of the functions and comparisons the sums is not a 
sufficiently robust way of checking to see if the optimized versions are indeed 
equivalent to the non-optimized ones.
The specification of StrictMath requires a particular result for each set of 
floating-point arguments and sums get round-away low-order bits that differ.

That's really good point, thanks for letting me know about that. I'll re-test my
change under that perspective.

Running the JDK math library regression tests and corresponding JCK tests is 
recommended for work in this area.

Got it. By "the JDK math library regression tests" you mean exactly which test
suite? the jtreg tests?

Specifically, the regression tests under test/java/lang/Math and 
test/java/lang/StrictMath in the jdk repository. There are some other math 
library tests in the hotspot repo, but I don't know where
they are offhand.

A note on methodologies, when I've been writing test for my port I've tried to 
include test cases that exercise all the branches point in the code. Due to the 
large input space (~2^64 for a
single-argument method), random sampling alone is an inefficient way to try to 
find differences in behavior.

For testing against JCK/TCK I'll need some help on that.

I believe the JCK/TCK does have additional testcases relevant here.

HTH; thanks,

-Joe

Thank you very much for the valuable comments.

I'll send a webrev accordingly for review.

I filed a bug: https://bugs.openjdk.java.net/browse/JDK-8170153


Best regards,
Gustavo

Re: PPC64: Poor StrictMath performance due to non-optimized compilation

Reply via email to