Hi Joe,

On 17-11-2016 19:33, joe darcy wrote:
>>>> Currently, optimization for building fdlibm is disabled, except for the
>>>> "solaris" OS target [1].
>>> The reason for that is because historically the Solaris compilers have had 
>>> sufficient discipline and control regarding floating-point semantics and 
>>> compiler optimizations to still implement the
>>> Java-mandated results when optimization was enabled. The gcc family of 
>>> compilers, for example, has lacked such discipline.
>> oh, I see. Thanks for clarifying that. I was exactly wondering why fdlibm
>> optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 only, does
>> not affect the precision, even if setting -O3 does not improve the 
>> performance
>> as much as on PPC64.
> 
> The fdlibm code relies on aliasing a two-element array of int with a double 
> to do bit-level reads and writes of floating-point values. As I understand 
> it, the C spec allows compilers to assume values
> of different types don't overlap in memory. The compilation environment has 
> to be configured in such a way that the C compiler disables code generation 
> and optimization techniques that would run afoul
> of these fdlibm coding practices.

On discussing with the Power toolchain folks we narrowed down the issue on PPC64
to the FMA. -fno-strict-aliasing has no effect and when used with an aggressive
optimization does not solve the issue on precision. Thus -ffp-contract=off is
the best options we have by now to optimize the fdlibm on PPC64.


>>> Methods in the Math class, such as pow, are often intrinsified and use a 
>>> different algorithm so a straight performance comparison may not be as fair 
>>> or meaningful in those cases.
>> I agree. It's just that the issue on StrictMath methods was first noted due 
>> to
>> that huge gap (Math vs StrictMath) on PPC64, which is not prominent on x64.
> 
> Depending on how Math.{sin, cos} is implemented on PPC64, compiling the 
> fdlibm sin/cos with more aggressive optimizations should not be expected to 
> close the performance gap. In particular, if
> Math.{sin, cos} is an intrinsic on PPC64 (I haven't checked the sources) that 
> used platform-specific feature (say fused multiply add instructions) then 
> just compiling fdlibm more aggressively wouldn't
> necessarily make up that gap.

In our case (PPC64) it does close the gap. Non-optimized code will suffer a lot,
for instance, from load-hit-store issues. Contrary to what happens on PPC64, the
gap on x64 seems to be quite small as you said.


> 
> To allow cross-platform and cross-release reproducibility, StrictMath is 
> specified to use the particular fdlibm algorithms, which precludes using 
> better algorithms developed more recently. If we were
> to start with a clean slate today, to get such reproducibility we would 
> specify correctly-rounded behavior of all those methods, but such an approach 
> was much less tractable technical 20+ years ago
> without benefit of the research that was been done in the interim, such as 
> the work of Prof. Muller and associates: 
> https://lipforge.ens-lyon.fr/projects/crlibm/.
> 
>>
>>
>>> Accumulating the the results of the functions and comparisons the sums is 
>>> not a sufficiently robust way of checking to see if the optimized versions 
>>> are indeed equivalent to the non-optimized ones.
>>> The specification of StrictMath requires a particular result for each set 
>>> of floating-point arguments and sums get round-away low-order bits that 
>>> differ.
>> That's really good point, thanks for letting me know about that. I'll 
>> re-test my
>> change under that perspective.
>>
>>
>>> Running the JDK math library regression tests and corresponding JCK tests 
>>> is recommended for work in this area.
>> Got it. By "the JDK math library regression tests" you mean exactly which 
>> test
>> suite? the jtreg tests?
> 
> Specifically, the regression tests under test/java/lang/Math and 
> test/java/lang/StrictMath in the jdk repository. There are some other math 
> library tests in the hotspot repo, but I don't know where
> they are offhand.
> 
> A note on methodologies, when I've been writing test for my port I've tried 
> to include test cases that exercise all the branches point in the code. Due 
> to the large input space (~2^64 for a
> single-argument method), random sampling alone is an inefficient way to try 
> to find differences in behavior.
>> For testing against JCK/TCK I'll need some help on that.
>>
> 
> I believe the JCK/TCK does have additional testcases relevant here.
> 
> HTH; thanks,
> 
> -Joe
> 

Thank you very much for the valuable comments.

I'll send a webrev accordingly for review.

I filed a bug: https://bugs.openjdk.java.net/browse/JDK-8170153


Best regards,
Gustavo

Reply via email to