Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog,dlog10,dexp iff 2.29 or greater on AArch64.

2022-04-06 Thread Tobias Hartmann
On Fri, 1 Apr 2022 15:38:36 GMT, Andrew Haley  wrote:

>> Will this patch change `java.lang.Math`, `java.lang.StrictMath` or both? 
>> I've noticed differences in iterative machine learning algorithms using exp 
>> & log across different JVMs and architectures which we try to track in 
>> [Tribuo](https://github.com/oracle/tribuo) by recording the JVM & arch in 
>> our model provenance objects. If this patch is integrated will there be an 
>> easy way to get (e.g. from `System.getProperty`) what implementation of exp 
>> is in use by the current JVM? Otherwise I won't be able to notify users that 
>> the model may not reproduce if they rerun the same computation on different 
>> versions of Linux with the same JVM & architecture.
>
>> Will this patch change `java.lang.Math`, `java.lang.StrictMath` or both? 
>> I've noticed differences in iterative machine learning algorithms using exp 
>> & log across different JVMs and architectures which we try to track in 
>> [Tribuo](https://github.com/oracle/tribuo) by recording the JVM & arch in 
>> our model provenance objects.
> 
> Exactly so, and that is why this patch was never integrated. This was only 
> ever going to be about `java.lang.Math`, but we foundered on the rock of 
> monotonicity. Here's the spec:
> 
> "most methods with more than 0.5 ulp errors are required to be 
> semi-monotonic: whenever the mathematical function is non-decreasing, so is 
> the floating-point approximation, likewise, whenever the mathematical 
> function is non-increasing, so is the floating-point approximation. Not all 
> approximations that have 1 ulp accuracy will automatically meet the 
> monotonicity requirements."
> 
> We couldn't guarantee we'd meet the monotonicity requirements if we used 
> glibc libm, so this patch was, with some regret, abandoned.

@theRealAph Thanks for the summary. I closed the JBS issue as Won't Fix.

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog,dlog10,dexp iff 2.29 or greater on AArch64.

2022-03-30 Thread Tobias Hartmann
On Tue, 25 May 2021 15:32:40 GMT, gregcawthorne  wrote:

>> Glibc 2.29 onwards provides optimised versions of log,log10,exp.
>> These functions have an accuracy of 0.9ulp or better in glibc
>> 2.29.
>> 
>> Therefore this patch adds code to parse, store and check
>> the runtime glibcs version in os_linux.cpp/hpp.
>> This is then used to select the glibcs implementation of
>> log, log10, exp at runtime for c1 and c2, iff we have
>> glibc 2.29 or greater.
>> 
>> This will ensure OpenJDK can benefit from future improvements
>> to glibc.
>> 
>> Glibc adheres to the ieee754 standard, unless stated otherwise
>> in its spec.
>> 
>> As there are no stated exceptions in the current glibc spec
>> for dlog, dlog10 and dexp, we can assume they currently follow
>> ieee754 (which testing confirms). As such, future version of
>> glibc are unlikely to lose this compliance with ieee754 in
>> future.
>> 
>> W.r.t performance this patch sees ~15-30% performance improvements for
>> log and log10, with ~50-80% performance improvements for exp for the
>> common input ranged (which output real numbers). However for the NaN
>> and inf output ranges we see a slow down of up to a factor of 2 for
>> some functions and architectures.
>> 
>> Due to this being the uncommon case we assert that this is a
>> worthwhile tradeoff.
>
> greg.cawtho...@arm.com
> 
> Should work

@gregcawthorne any plans to re-open and fix this?

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-25 Thread gregcawthorne
On Thu, 15 Apr 2021 08:33:47 GMT, gregcawthorne 
 wrote:

> Glibc 2.29 onwards provides optimised versions of log,log10,exp.
> These functions have an accuracy of 0.9ulp or better in glibc
> 2.29.
> 
> Therefore this patch adds code to parse, store and check
> the runtime glibcs version in os_linux.cpp/hpp.
> This is then used to select the glibcs implementation of
> log, log10, exp at runtime for c1 and c2, iff we have
> glibc 2.29 or greater.
> 
> This will ensure OpenJDK can benefit from future improvements
> to glibc.
> 
> Glibc adheres to the ieee754 standard, unless stated otherwise
> in its spec.
> 
> As there are no stated exceptions in the current glibc spec
> for dlog, dlog10 and dexp, we can assume they currently follow
> ieee754 (which testing confirms). As such, future version of
> glibc are unlikely to lose this compliance with ieee754 in
> future.
> 
> W.r.t performance this patch sees ~15-30% performance improvements for
> log and log10, with ~50-80% performance improvements for exp for the
> common input ranged (which output real numbers). However for the NaN
> and inf output ranges we see a slow down of up to a factor of 2 for
> some functions and architectures.
> 
> Due to this being the uncommon case we assert that this is a
> worthwhile tradeoff.

greg.cawtho...@arm.com

Should work

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-25 Thread Andrew Haley
Greg, what's your email address? Everything I try bounces.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-25 Thread Andrew Dinn
On Thu, 15 Apr 2021 08:33:47 GMT, gregcawthorne 
 wrote:

> Glibc 2.29 onwards provides optimised versions of log,log10,exp.
> These functions have an accuracy of 0.9ulp or better in glibc
> 2.29.
> 
> Therefore this patch adds code to parse, store and check
> the runtime glibcs version in os_linux.cpp/hpp.
> This is then used to select the glibcs implementation of
> log, log10, exp at runtime for c1 and c2, iff we have
> glibc 2.29 or greater.
> 
> This will ensure OpenJDK can benefit from future improvements
> to glibc.
> 
> Glibc adheres to the ieee754 standard, unless stated otherwise
> in its spec.
> 
> As there are no stated exceptions in the current glibc spec
> for dlog, dlog10 and dexp, we can assume they currently follow
> ieee754 (which testing confirms). As such, future version of
> glibc are unlikely to lose this compliance with ieee754 in
> future.
> 
> W.r.t performance this patch sees ~15-30% performance improvements for
> log and log10, with ~50-80% performance improvements for exp for the
> common input ranged (which output real numbers). However for the NaN
> and inf output ranges we see a slow down of up to a factor of 2 for
> some functions and architectures.
> 
> Due to this being the uncommon case we assert that this is a
> worthwhile tradeoff.

> [ One thing: Java uses the term "semi-monotonic" to
> mean "whenever the mathematical function is non-decreasing, so is
> the floating-point approximation, likewise, whenever the
> mathematical function is non-increasing, so is the floating-point
> approximation." I don't really understand what distinction means. ]

I believe this is to allow for the fact that the function is continuous and the 
floating-point approximation is discrete.

Let F be the actual function and f the floating point approximation.  Assume we 
have two successive floating point values x, x'  and, without loss of 
generality, F(x) <= F(x'). What are the circumstances under which we require 
f(x) =< f(x')? Semi-monotonicity says that is only needed when F is 
non-decreasing on the interval [x, x']. Expressed more precisely, the condition 
that F is non-decreasing is

  for all y such that x =< y =< x' : F(x) <= F(y) <= F(x').

In other words:

  if the graph only ever stays level or increases across the interval [x, x'] 
then we must have f(x) =< f(x')

  If the graph wiggles *up* and *down* across the interval [x, x'] we can allow 
f(x) > f(x').

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-24 Thread gregcawthorne
On Thu, 15 Apr 2021 08:33:47 GMT, gregcawthorne 
 wrote:

> Glibc 2.29 onwards provides optimised versions of log,log10,exp.
> These functions have an accuracy of 0.9ulp or better in glibc
> 2.29.
> 
> Therefore this patch adds code to parse, store and check
> the runtime glibcs version in os_linux.cpp/hpp.
> This is then used to select the glibcs implementation of
> log, log10, exp at runtime for c1 and c2, iff we have
> glibc 2.29 or greater.
> 
> This will ensure OpenJDK can benefit from future improvements
> to glibc.
> 
> Glibc adheres to the ieee754 standard, unless stated otherwise
> in its spec.
> 
> As there are no stated exceptions in the current glibc spec
> for dlog, dlog10 and dexp, we can assume they currently follow
> ieee754 (which testing confirms). As such, future version of
> glibc are unlikely to lose this compliance with ieee754 in
> future.
> 
> W.r.t performance this patch sees ~15-30% performance improvements for
> log and log10, with ~50-80% performance improvements for exp for the
> common input ranged (which output real numbers). However for the NaN
> and inf output ranges we see a slow down of up to a factor of 2 for
> some functions and architectures.
> 
> Due to this being the uncommon case we assert that this is a
> worthwhile tradeoff.

I have been reading up on the monotonicity paper suggested by Andrew Haley:
[http://www-leland.stanford.edu/class/ee486/doc/ferguson1991.pdf](url)

In order to try and see if I can prove the current glibc implementations of log 
and exp, for monotonicity.

However, I have come to the conclusion that the paper calculates the relative 
error threshold for monotonicity for an approximation, and then relies on extra 
bits of floating-point hardware precision to be guaranteed monotonic. These 
extra bits of precision are greater than the target representations mantissa 
bits, which when subsequently rounded at the end (rounding is semi monotonic), 
leads to a monotonic implementation. No extra bits of floating-point precision 
are present in AArch64 in-between floating-point operations and so this paper 
won't help us in this case.

I am currently unsure how the current implementation (from fdlibm I believe) is 
proven to be monotonic. Perhaps there is a proof I am unaware of. The 
implementation has obviously stood the test of time at least.

If anyone has an idea of how the existing implementation was proven to be 
monotonic (if it has been), it could help us to apply it on the Arm optimised 
routine version (AOR).
What we can do for now is compare the remez approximation used in the current 
OpenJDK implementation (take log for example):
[https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L49](url)

It states:
" The maximum error of this polynomial approximation is bounded by 2**-58.45"
Which must be the theoretical accuracy, as it would be bounded by the 52 bits 
of mantissa if run on AArch64. And is insufficient for a proof of monotonicity 
on its own.

Now if you look AOR for log (where the implementation comes from):
[https://github.com/ARM-software/optimized-routines/blob/master/math/tools/log.sollya](url)
[https://github.com/ARM-software/optimized-routines/blob/master/math/tools/log_abs.sollya](url)
And run them, you will see the abs and relative accuracies of ~2**-63 for 
log.sollya
And for log_abs.sollya which is used for inputs around 1.0, there is an 
absolute accuracy of ~2**-65 and a relative error of ~ 2**-56.

So for log the theoretical accuracy is actually higher than the current 
implementations, apart from when near 1.0 the relative error is slightly worse, 
however I have confirmed with Szabolcs Nagy at Arm who worked on the 
implementation, that it is the absolute error here which dictates the effective 
accuracy, as there is arithmetic afterwards which changes the magnitude.
As for the existing accuracy of the exp the implementation code comments state 
the maximum theoretical error of the remez approximation is 2**-59.

While running the exp soylla script in AOR:
[https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L238](url)
It shows its remez has a theoretical accuracy of 2**-66.
[https://github.com/ARM-software/optimized-routines/blob/master/math/tools/exp.sollya](url)

Another thing to consider is the reconstruction process of the current fdlibm 
implementation and glibcs. glibcs exp and log uses a table lookup algorithm in 
order to allow their polynomial to have a smaller principle domain around 0 and 
it is then transformed to a larger principle domain, where as fdlibms does use 
this method.

A description of the table loop up scheme can be found here:
[https://dl.acm.org/doi/pdf/10.1145/63522.214389](url)
And an analysis of the error bounds of table look type approximations of 
functions can be found here:
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp==709374](url)

If the remez polynomial of fdlibm isn’t definitively proven to be 

Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-07 Thread Andrew Haley
On Thu, 29 Apr 2021 20:24:25 GMT, Carlos O'Donell 
 wrote:

> Where does the requirement for monotonicity come from?

Semi-monotonicity, to be precise. In the spec of java.lang.Math,

"Besides accuracy at individual arguments, maintaining proper relations between 
the method at different arguments is also important. Therefore, most methods 
with more than 0.5 ulp errors are required to be semi-monotonic: whenever the 
mathematical function is non-decreasing, so is the floating-point 
approximation, likewise, whenever the mathematical function is non-increasing, 
so is the floating-point approximation. Not all approximations that have 1 ulp 
accuracy will automatically meet the monotonicity requirements."

I wouldn't be surprised if the approximations we need in glibc meet this 
anyway. We just need to check.

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-06 Thread Joe Darcy



On 5/6/2021 5:21 AM, Carlos O'Donell wrote:

On Thu, 15 Apr 2021 08:33:47 GMT, gregcawthorne 
 wrote:


Glibc 2.29 onwards provides optimised versions of log,log10,exp.
These functions have an accuracy of 0.9ulp or better in glibc
2.29.

Therefore this patch adds code to parse, store and check
the runtime glibcs version in os_linux.cpp/hpp.
This is then used to select the glibcs implementation of
log, log10, exp at runtime for c1 and c2, iff we have
glibc 2.29 or greater.

This will ensure OpenJDK can benefit from future improvements
to glibc.

Glibc adheres to the ieee754 standard, unless stated otherwise
in its spec.

As there are no stated exceptions in the current glibc spec
for dlog, dlog10 and dexp, we can assume they currently follow
ieee754 (which testing confirms). As such, future version of
glibc are unlikely to lose this compliance with ieee754 in
future.

W.r.t performance this patch sees ~15-30% performance improvements for
log and log10, with ~50-80% performance improvements for exp for the
common input ranged (which output real numbers). However for the NaN
and inf output ranges we see a slow down of up to a factor of 2 for
some functions and architectures.

Due to this being the uncommon case we assert that this is a
worthwhile tradeoff.

Where does the requirement for monotonicity come from?


From the specifications:

"The computed result [of log] must be within 1 ulp of the exact result. 
Results must be semi-monotonic."


https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/Math.html#log(double)

and similarly for the other method.

-Joe





Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-05-06 Thread Carlos O'Donell
On Thu, 15 Apr 2021 08:33:47 GMT, gregcawthorne 
 wrote:

> Glibc 2.29 onwards provides optimised versions of log,log10,exp.
> These functions have an accuracy of 0.9ulp or better in glibc
> 2.29.
> 
> Therefore this patch adds code to parse, store and check
> the runtime glibcs version in os_linux.cpp/hpp.
> This is then used to select the glibcs implementation of
> log, log10, exp at runtime for c1 and c2, iff we have
> glibc 2.29 or greater.
> 
> This will ensure OpenJDK can benefit from future improvements
> to glibc.
> 
> Glibc adheres to the ieee754 standard, unless stated otherwise
> in its spec.
> 
> As there are no stated exceptions in the current glibc spec
> for dlog, dlog10 and dexp, we can assume they currently follow
> ieee754 (which testing confirms). As such, future version of
> glibc are unlikely to lose this compliance with ieee754 in
> future.
> 
> W.r.t performance this patch sees ~15-30% performance improvements for
> log and log10, with ~50-80% performance improvements for exp for the
> common input ranged (which output real numbers). However for the NaN
> and inf output ranges we see a slow down of up to a factor of 2 for
> some functions and architectures.
> 
> Due to this being the uncommon case we assert that this is a
> worthwhile tradeoff.

Where does the requirement for monotonicity come from?

-

PR: https://git.openjdk.java.net/jdk/pull/3510


Re: RFR: 8265768 [aarch64] Use glibc libm impl for dlog, dlog10, dexp iff 2.29 or greater on AArch64.

2021-04-28 Thread gregcawthorne
On Wed, 28 Apr 2021 09:25:01 GMT, Andrew Haley  wrote:

> Re monotonicity: all is not necessarily lost. There's a theorem due to 
> Ferguson and Brightman which says that
> 
> ```
> if
>  abs(f(m+) - f(m))
> eps < ---
>   abs(f(m+)) + abs(f(m))'
> 
> for all m, the approximation is monotone.
> 
> m is the machine number, m+ its successor
> eps the maximum relative error of the approximation f(m)
> ```
> 
> See
> http://www-leland.stanford.edu/class/ee486/doc/ferguson1991.pdf, particularly 
> the Appendix, which contains a table of expressions for the five basic 
> transcendental functions.

We can definitely be checked for single precision! I am uncertain how much 
water that would hold.

-

PR: https://git.openjdk.java.net/jdk/pull/3510