Hi Joe, Although neither a floating point expert (as I think I've proven to you over the years), or a gcc expert, I checked with our in-house gcc expert and got this following answer:
"Yes using -fno-strict-aliasing fixes the issues. Also there are many forks of fdlibm which has this fixed including the code inside glibc. " FWIW, - Derek -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-boun...@openjdk.java.net] On Behalf Of Chris Plummer Sent: Thursday, November 17, 2016 4:49 PM To: joe darcy <joe.da...@oracle.com>; Gustavo Romero <grom...@linux.vnet.ibm.com>; ppc-aix-port-...@openjdk.java.net; hotspot-...@openjdk.java.net; core-libs-...@openjdk.java.net Cc: build-dev <build-dev@openjdk.java.net> Subject: Re: PPC64: Poor StrictMath performance due to non-optimized compilation On 11/17/16 1:33 PM, joe darcy wrote: > Hi Gustavo, > > > On 11/17/2016 10:31 AM, Gustavo Romero wrote: >> Hi Joe, >> >> Thanks a lot for your valuable comments. >> >> On 17-11-2016 15:35, joe darcy wrote: >>>> Currently, optimization for building fdlibm is disabled, except for >>>> the "solaris" OS target [1]. >>> The reason for that is because historically the Solaris compilers >>> have had sufficient discipline and control regarding floating-point >>> semantics and compiler optimizations to still implement the >>> Java-mandated results when optimization was enabled. The gcc family >>> of compilers, for example, has lacked such discipline. >> oh, I see. Thanks for clarifying that. I was exactly wondering why >> fdlibm optimization is off even for x86_x64 as it, AFAICS regarding >> gcc 5 only, does not affect the precision, even if setting -O3 does >> not improve the performance as much as on PPC64. > > The fdlibm code relies on aliasing a two-element array of int with a > double to do bit-level reads and writes of floating-point values. As I > understand it, the C spec allows compilers to assume values of > different types don't overlap in memory. The compilation environment > has to be configured in such a way that the C compiler disables code > generation and optimization techniques that would run afoul of these > fdlibm coding practices. This is the strict aliasing issue right? It's a long standing problem with fdlibm that kept getting worse as gcc got smarter. IIRC, compiling with -fno-strict-aliasing fixes it, but it's been more than 12 years since I last dealt with fdlibm and compiler aliasing issues. Chris > >>>> As a consequence on PPC64 (Linux) StrictMath methods like, but not >>>> limited to, sin(), cos(), and tan() perform verify poor in >>>> comparison to the same methods in Math class [2]: >>> If you are doing your work against JDK 9, note that the pow, hypot, >>> and cbrt fdlibm methods required by StrictMath have been ported to >>> Java (JDK-8134780: Port fdlibm to Java). I have intentions to port >>> the remaining methods to Java, but it is unclear whether or not this >>> will occur for JDK 9. >> Yes, I'm doing my work against 9. So is there any problem if I >> proceed with my change? I understand that there is no conflict as >> JDK-8134780 progresses and replaces the StrictMath methods by their >> counterparts in Java. >> Please, advice. > > If I manage to finish the fdlibm C -> Java port in JDK 9, the changes > you are proposing would eventually be removed as unneeded since the C > code wouldn't be there to get compiled anymore. > >> >> Is it intended to downport JDK-8134780 to 8? > > Such a backport would be technically possible, but we at Oracle don't > currently plan to do so. > >> >> >>> Methods in the Math class, such as pow, are often intrinsified and >>> use a different algorithm so a straight performance comparison may >>> not be as fair or meaningful in those cases. >> I agree. It's just that the issue on StrictMath methods was first >> noted due to that huge gap (Math vs StrictMath) on PPC64, which is >> not prominent on x64. > > Depending on how Math.{sin, cos} is implemented on PPC64, compiling > the fdlibm sin/cos with more aggressive optimizations should not be > expected to close the performance gap. In particular, if Math.{sin, > cos} is an intrinsic on PPC64 (I haven't checked the sources) that > used platform-specific feature (say fused multiply add instructions) > then just compiling fdlibm more aggressively wouldn't necessarily make > up that gap. > > To allow cross-platform and cross-release reproducibility, StrictMath > is specified to use the particular fdlibm algorithms, which precludes > using better algorithms developed more recently. If we were to start > with a clean slate today, to get such reproducibility we would specify > correctly-rounded behavior of all those methods, but such an approach > was much less tractable technical 20+ years ago without benefit of the > research that was been done in the interim, such as the work of Prof. > Muller and associates: https://lipforge.ens-lyon.fr/projects/crlibm/. > >> >> >>> Accumulating the the results of the functions and comparisons the >>> sums is not a sufficiently robust way of checking to see if the >>> optimized versions are indeed equivalent to the non-optimized ones. >>> The specification of StrictMath requires a particular result for >>> each set of floating-point arguments and sums get round-away >>> low-order bits that differ. >> That's really good point, thanks for letting me know about that. I'll >> re-test my change under that perspective. >> >> >>> Running the JDK math library regression tests and corresponding JCK >>> tests is recommended for work in this area. >> Got it. By "the JDK math library regression tests" you mean exactly >> which test >> suite? the jtreg tests? > > Specifically, the regression tests under test/java/lang/Math and > test/java/lang/StrictMath in the jdk repository. There are some other > math library tests in the hotspot repo, but I don't know where they > are offhand. > > A note on methodologies, when I've been writing test for my port I've > tried to include test cases that exercise all the branches point in > the code. Due to the large input space (~2^64 for a single-argument > method), random sampling alone is an inefficient way to try to find > differences in behavior. >> For testing against JCK/TCK I'll need some help on that. >> > > I believe the JCK/TCK does have additional testcases relevant here. > > HTH; thanks, > > -Joe