[ 
https://issues.apache.org/jira/browse/STATISTICS-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477335#comment-17477335
 ] 

Alex Herbert commented on STATISTICS-52:
----------------------------------------

I've put together a version of the normal distribution PDF that computes x^2 to 
extended precision. The extra bits can be used in the exp function. This has an 
increasingly large effect when x is far from zero. Here is a table of the mean 
ULP difference between the standard normal PDF and the one with extended 
precision. Data uses 10000 uniform values for each integer interval (i.e. x + 
[0, 1)):
||x||diff||
|0|0.0|
|1|0.1398|
|2|0.8404|
|3|1.4129|
|4|2.8729|
|5|3.8279|
|6|5.821|
|7|5.7296|
|8|11.4693|
|9|11.4899|
|10|11.5295|
|11|19.3796|
|12|23.4929|
|13|22.7045|
|14|22.9185|
|15|23.0753|
|16|46.582|
|17|46.379|
|18|46.3802|
|19|46.1087|
|20|45.5852|
|21|46.2768|
|22|63.8211|
|23|91.9603|
|24|92.9785|
|25|92.4941|
|26|92.6143|
|27|93.1249|
|28|91.7608|
|29|93.1293|
|30|92.437|
|31|93.0956|
|32|184.0093|
|33|184.5708|
|34|185.3242|
|35|183.9677|
|36|185.4451|
|37|118.6374|
|38|0.0|

Note the differences are moving to a more correct result. I tested with 1000 
random points in [1, 39) using a 128-bit reference implementation. The high 
accuracy method is with in 2 ulp of the expected result.

Note that the normal distribution x-values are typically small. The use of 
extra bits does not work when 0.5*x*x is less than 1, i.e. x is less than 
sqrt(2). This covers 84.27% of the distribution. Using 100000 randomly 
distributed samples from a normal distribution the mean ULP difference is 
0.062. So most of the time this makes no difference. It also means that most of 
the time the method does not have to compute to extended precision. Here's the 
method:
{code:java}
    /**
     * Compute {@code exp(-0.5*x*x)} with high accuracy.
     *
     * @param x Value
     * @return exp(-0.5*x*x)
     */
    private static double expmhxx(double x) {
      final double a = x * x;
      if (a <= 2) {
        // 84.27% of the time
        return Math.exp(-0.5 * a);
      } else if (a > 1491) {
        return 0;
      }

      // Working not shown to compute the round-off
      final double b = -0.5 * squareLowUnscaled(x, a);

      final double ea = Math.exp(-0.5 * a);
      // b ~ expm1(b)
      // => exp(-0.5*(a+b)) = exp(-0.5*a) * exp(-0.5*b) 
      //                    = exp(-0.5*a) * expm1(-0.5*b) + exp(-0.5*a)
      return ea - 0.5 * b * ea;
    }
{code}
Note the magic number 1491 is larger than {{{}-2 * Math.log(Double.MIN_VALUE) 
== 1488.88{}}}. So often the method uses standard precision or detects the 
value is out-of-range.

A quick JMH test calling the method with normally distributed random values:
 
||Method||Score||
|Baseline|4.176|
|exp|21.950|
|Extended precision|23.483|

The baseline is computing the random normal value x but not any distribution 
PDF. The high-accuracy PDF is approximately 10% slower on normally distributed 
data.

Another use case is to require the PDF far into the tails (max x ~ 38.5). On 
uniformly distributed data:
||Method||low||high||Score||
|Baseline|0|1|4.314|
|exp|0|1|17.879|
|Extended precision|0|1|19.191|
|Baseline|0|10|4.278|
|exp|0|10|17.900|
|Extended precision|0|10|25.374|
|Baseline|0|30|4.343|
|exp|0|30|22.931|
|Extended precision|0|30|24.915|
|Baseline|0|100|4.288|
|exp|0|100|16.588|
|Extended precision|0|100|17.783|
|Baseline|5|20|4.395|
|exp|5|20|16.120|
|Extended precision|5|20|20.118|

The biggest difference is observed for the range {{[0, 10]}} where the extended 
precision method is about 55% slower. In this range the method must branch to 
compute either standard or extended precision.

In the range {{[5, 20]}} the method always uses extended precision. This is 
about 33% slower.
h2. Conclusion

It is possible to increase the accuracy of the normal distribution PDF in the 
tails. This does not have much impact on runtime performance for normally 
distributed data. It also does not have much impact on accuracy for normally 
distributed data. Far into the tails this can increase accuracy as much as 200 
ulp, which is around 3 significant digits, and the result is within a few ulp 
of the true result. This may have a runtime impact of around 50%.

 

> High precision PDF for the Normal dsitribution
> ----------------------------------------------
>
>                 Key: STATISTICS-52
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-52
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The normal distribution PDF is computed using:
>  
> {code:java}
> Math.exp(-0.5 * x * x) / Math.sqrt(2 * Math.PI)
> {code}
> The value {{x^2}} can be computed to extended precision. This extra 
> information in the round-off bits can increase the accuracy of the 
> exponential function (see NUMBERS-177 under the title 'Accurate scaling by 
> exp(z*z)').
>  
> The effect of including the round-off bits on both accuracy and speed should 
> be investigated.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to