Hello,
On 11/16/2016 5:45 PM, Gustavo Romero wrote:
Hi, Currently, optimization for building fdlibm is disabled, except for the "solaris" OS target [1].
The reason for that is because historically the Solaris compilers have had sufficient discipline and control regarding floating-point semantics and compiler optimizations to still implement the Java-mandated results when optimization was enabled. The gcc family of compilers, for example, has lacked such discipline.
As a consequence on PPC64 (Linux) StrictMath methods like, but not limited to, sin(), cos(), and tan() perform verify poor in comparison to the same methods in Math class [2]:
If you are doing your work against JDK 9, note that the pow, hypot, and cbrt fdlibm methods required by StrictMath have been ported to Java (JDK-8134780: Port fdlibm to Java). I have intentions to port the remaining methods to Java, but it is unclear whether or not this will occur for JDK 9.
Methods in the Math class, such as pow, are often intrinsified and use a different algorithm so a straight performance comparison may not be as fair or meaningful in those cases.
Math StrictMath ========= ========== sin 0m29.984s 1m41.184s cos 0m30.031s 1m41.200s tan 0m31.772s 1m46.976s asin 0m4.577s 0m4.543s acos 0m4.539s 0m4.525s atan 0m12.929s 0m12.896s exp 0m1.071s 0m4.570s log 0m3.272s 0m14.239s log10 0m4.362s 0m20.236s sqrt 0m0.913s 0m0.981s cbrt 0m10.786s 0m10.808s sinh 0m4.438s 0m4.433s cosh 0m4.496s 0m4.478s tanh 0m3.360s 0m3.353s expm1 0m4.076s 0m4.094s log1p 0m13.518s 0m13.527s IEEEremainder 0m38.803s 0m38.909s atan2 0m20.100s 0m20.057s pow 0m14.096s 0m19.938s hypot 0m5.136s 0m5.122s Switching on the O3 optimization can damage precision of those methods, nonetheless it's possible to avoid that side effect and yet get huge benefits of the -O3 optimization on PPC64 if -fno-expensive-optimizations is passed in addition to the -O3 optimization flag. In that sense the following change is proposed to resolve the issue: diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk --- a/make/lib/CoreLibraries.gmk Wed Nov 09 13:37:19 2016 +0100 +++ b/make/lib/CoreLibraries.gmk Wed Nov 16 19:11:11 2016 -0500 @@ -33,10 +33,16 @@ # libfdlibm is statically linked with libjava below and not delivered into the # product on its own. -BUILD_LIBFDLIBM_OPTIMIZATION := HIGH +BUILD_LIBFDLIBM_OPTIMIZATION := NONE -ifneq ($(OPENJDK_TARGET_OS), solaris) - BUILD_LIBFDLIBM_OPTIMIZATION := NONE +ifeq ($(OPENJDK_TARGET_OS), solaris) + BUILD_LIBFDLIBM_OPTIMIZATION := HIGH +endif + +ifeq ($(OPENJDK_TARGET_OS), linux) + ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc) + BUILD_LIBFDLIBM_OPTIMIZATION := HIGH + endif endif LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm @@ -51,6 +57,7 @@ CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \ CFLAGS_windows_debug := -DLOGGING, \ CFLAGS_aix := -qfloat=nomaf, \ + CFLAGS_linux_ppc := -fno-expensive-optimizations, \ DISABLED_WARNINGS_gcc := sign-compare, \ DISABLED_WARNINGS_microsoft := 4146 4244 4018, \ ARFLAGS := $(ARFLAGS), \ diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk --- a/make/common/NativeCompilation.gmk Wed Nov 09 15:32:39 2016 +0100 +++ b/make/common/NativeCompilation.gmk Wed Nov 16 19:08:06 2016 -0500 @@ -569,16 +569,19 @@ $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) $$($1_EXTRA_OBJECT_FILES)) # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CFLAGS. - $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) + $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \ + $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)) ifneq ($(DEBUG_LEVEL),release) # Pickup extra debug dependent variables for CFLAGS $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug) $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug) $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug) + $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug) else $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release) $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release) $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release) + $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release) endif # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent variables for CXXFLAGS. After enabling the optimization it's possible to again up to 3x on performance regarding the aforementioned methods without losing precision: StrictMath, original StrictMath, optimized ============================ ============================ sin 1.7136493465700542 1m41.184s 1.7136493465700542 0m33.895s cos 0.1709843554185943 1m41.200s 0.1709843554185943 0m33.884s tan -5.5500322522995315E7 1m46.976s -5.5500322522995315E7 0m36.461s asin NaN 0m4.543s NaN 0m3.175s acos NaN 0m4.525s NaN 0m3.211s atan 1.5707961389886132E8 0m12.896s 1.5707961389886132E8 0m7.100s exp Infinity 0m4.570s Infinity 0m3.187s log 1.7420680845245087E9 0m14.239s 1.7420680845245087E9 0m7.170s log10 7.565705562087342E8 0m20.236s 7.565705562087342E8 0m9.610s sqrt 6.66666671666567E11 0m0.981s 6.66666671666567E11 0m0.948s cbrt 3.481191648389617E10 0m10.808s 3.481191648389617E10 0m10.786s sinh Infinity 0m4.433s Infinity 0m3.179s cosh Infinity 0m4.478s Infinity 0m3.174s tanh 9.999999971990079E7 0m3.353s 9.999999971990079E7 0m3.208s expm1 Infinity 0m4.094s Infinity 0m3.185s log1p 1.7420681029451895E9 0m13.527s 1.7420681029451895E9 0m8.756s IEEEremainder 502000.0 0m38.909s 502000.0 0m14.055s atan2 1.570453905253704E8 0m20.057s 1.570453905253704E8 0m10.510s pow Infinity 0m19.938s Infinity 0m20.204s hypot 5.000000099033372E15 0m5.122s 5.000000099033372E15 0m5.130s I believe that as the FC is passed but FEC is not the change can, after the due scrutiny and review, be pushed if a special exception approval grants it. Once on 9, I'll request the downport to 8.
Accumulating the the results of the functions and comparisons the sums is not a sufficiently robust way of checking to see if the optimized versions are indeed equivalent to the non-optimized ones. The specification of StrictMath requires a particular result for each set of floating-point arguments and sums get round-away low-order bits that differ.
Running the JDK math library regression tests and corresponding JCK tests is recommended for work in this area.
Cheers, -Joe