Hi Joe, On 17-11-2016 19:33, joe darcy wrote: >>>> Currently, optimization for building fdlibm is disabled, except for the >>>> "solaris" OS target [1]. >>> The reason for that is because historically the Solaris compilers have had >>> sufficient discipline and control regarding floating-point semantics and >>> compiler optimizations to still implement the >>> Java-mandated results when optimization was enabled. The gcc family of >>> compilers, for example, has lacked such discipline. >> oh, I see. Thanks for clarifying that. I was exactly wondering why fdlibm >> optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 only, does >> not affect the precision, even if setting -O3 does not improve the >> performance >> as much as on PPC64. > > The fdlibm code relies on aliasing a two-element array of int with a double > to do bit-level reads and writes of floating-point values. As I understand > it, the C spec allows compilers to assume values > of different types don't overlap in memory. The compilation environment has > to be configured in such a way that the C compiler disables code generation > and optimization techniques that would run afoul > of these fdlibm coding practices.
On discussing with the Power toolchain folks we narrowed down the issue on PPC64 to the FMA. -fno-strict-aliasing has no effect and when used with an aggressive optimization does not solve the issue on precision. Thus -ffp-contract=off is the best options we have by now to optimize the fdlibm on PPC64. >>> Methods in the Math class, such as pow, are often intrinsified and use a >>> different algorithm so a straight performance comparison may not be as fair >>> or meaningful in those cases. >> I agree. It's just that the issue on StrictMath methods was first noted due >> to >> that huge gap (Math vs StrictMath) on PPC64, which is not prominent on x64. > > Depending on how Math.{sin, cos} is implemented on PPC64, compiling the > fdlibm sin/cos with more aggressive optimizations should not be expected to > close the performance gap. In particular, if > Math.{sin, cos} is an intrinsic on PPC64 (I haven't checked the sources) that > used platform-specific feature (say fused multiply add instructions) then > just compiling fdlibm more aggressively wouldn't > necessarily make up that gap. In our case (PPC64) it does close the gap. Non-optimized code will suffer a lot, for instance, from load-hit-store issues. Contrary to what happens on PPC64, the gap on x64 seems to be quite small as you said. > > To allow cross-platform and cross-release reproducibility, StrictMath is > specified to use the particular fdlibm algorithms, which precludes using > better algorithms developed more recently. If we were > to start with a clean slate today, to get such reproducibility we would > specify correctly-rounded behavior of all those methods, but such an approach > was much less tractable technical 20+ years ago > without benefit of the research that was been done in the interim, such as > the work of Prof. Muller and associates: > https://lipforge.ens-lyon.fr/projects/crlibm/. > >> >> >>> Accumulating the the results of the functions and comparisons the sums is >>> not a sufficiently robust way of checking to see if the optimized versions >>> are indeed equivalent to the non-optimized ones. >>> The specification of StrictMath requires a particular result for each set >>> of floating-point arguments and sums get round-away low-order bits that >>> differ. >> That's really good point, thanks for letting me know about that. I'll >> re-test my >> change under that perspective. >> >> >>> Running the JDK math library regression tests and corresponding JCK tests >>> is recommended for work in this area. >> Got it. By "the JDK math library regression tests" you mean exactly which >> test >> suite? the jtreg tests? > > Specifically, the regression tests under test/java/lang/Math and > test/java/lang/StrictMath in the jdk repository. There are some other math > library tests in the hotspot repo, but I don't know where > they are offhand. > > A note on methodologies, when I've been writing test for my port I've tried > to include test cases that exercise all the branches point in the code. Due > to the large input space (~2^64 for a > single-argument method), random sampling alone is an inefficient way to try > to find differences in behavior. >> For testing against JCK/TCK I'll need some help on that. >> > > I believe the JCK/TCK does have additional testcases relevant here. > > HTH; thanks, > > -Joe > Thank you very much for the valuable comments. I'll send a webrev accordingly for review. I filed a bug: https://bugs.openjdk.java.net/browse/JDK-8170153 Best regards, Gustavo