[ https://issues.apache.org/jira/browse/STATISTICS-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477335#comment-17477335 ]
Alex Herbert commented on STATISTICS-52: ---------------------------------------- I've put together a version of the normal distribution PDF that computes x^2 to extended precision. The extra bits can be used in the exp function. This has an increasingly large effect when x is far from zero. Here is a table of the mean ULP difference between the standard normal PDF and the one with extended precision. Data uses 10000 uniform values for each integer interval (i.e. x + [0, 1)): ||x||diff|| |0|0.0| |1|0.1398| |2|0.8404| |3|1.4129| |4|2.8729| |5|3.8279| |6|5.821| |7|5.7296| |8|11.4693| |9|11.4899| |10|11.5295| |11|19.3796| |12|23.4929| |13|22.7045| |14|22.9185| |15|23.0753| |16|46.582| |17|46.379| |18|46.3802| |19|46.1087| |20|45.5852| |21|46.2768| |22|63.8211| |23|91.9603| |24|92.9785| |25|92.4941| |26|92.6143| |27|93.1249| |28|91.7608| |29|93.1293| |30|92.437| |31|93.0956| |32|184.0093| |33|184.5708| |34|185.3242| |35|183.9677| |36|185.4451| |37|118.6374| |38|0.0| Note the differences are moving to a more correct result. I tested with 1000 random points in [1, 39) using a 128-bit reference implementation. The high accuracy method is with in 2 ulp of the expected result. Note that the normal distribution x-values are typically small. The use of extra bits does not work when 0.5*x*x is less than 1, i.e. x is less than sqrt(2). This covers 84.27% of the distribution. Using 100000 randomly distributed samples from a normal distribution the mean ULP difference is 0.062. So most of the time this makes no difference. It also means that most of the time the method does not have to compute to extended precision. Here's the method: {code:java} /** * Compute {@code exp(-0.5*x*x)} with high accuracy. * * @param x Value * @return exp(-0.5*x*x) */ private static double expmhxx(double x) { final double a = x * x; if (a <= 2) { // 84.27% of the time return Math.exp(-0.5 * a); } else if (a > 1491) { return 0; } // Working not shown to compute the round-off final double b = -0.5 * squareLowUnscaled(x, a); final double ea = Math.exp(-0.5 * a); // b ~ expm1(b) // => exp(-0.5*(a+b)) = exp(-0.5*a) * exp(-0.5*b) // = exp(-0.5*a) * expm1(-0.5*b) + exp(-0.5*a) return ea - 0.5 * b * ea; } {code} Note the magic number 1491 is larger than {{{}-2 * Math.log(Double.MIN_VALUE) == 1488.88{}}}. So often the method uses standard precision or detects the value is out-of-range. A quick JMH test calling the method with normally distributed random values: ||Method||Score|| |Baseline|4.176| |exp|21.950| |Extended precision|23.483| The baseline is computing the random normal value x but not any distribution PDF. The high-accuracy PDF is approximately 10% slower on normally distributed data. Another use case is to require the PDF far into the tails (max x ~ 38.5). On uniformly distributed data: ||Method||low||high||Score|| |Baseline|0|1|4.314| |exp|0|1|17.879| |Extended precision|0|1|19.191| |Baseline|0|10|4.278| |exp|0|10|17.900| |Extended precision|0|10|25.374| |Baseline|0|30|4.343| |exp|0|30|22.931| |Extended precision|0|30|24.915| |Baseline|0|100|4.288| |exp|0|100|16.588| |Extended precision|0|100|17.783| |Baseline|5|20|4.395| |exp|5|20|16.120| |Extended precision|5|20|20.118| The biggest difference is observed for the range {{[0, 10]}} where the extended precision method is about 55% slower. In this range the method must branch to compute either standard or extended precision. In the range {{[5, 20]}} the method always uses extended precision. This is about 33% slower. h2. Conclusion It is possible to increase the accuracy of the normal distribution PDF in the tails. This does not have much impact on runtime performance for normally distributed data. It also does not have much impact on accuracy for normally distributed data. Far into the tails this can increase accuracy as much as 200 ulp, which is around 3 significant digits, and the result is within a few ulp of the true result. This may have a runtime impact of around 50%. > High precision PDF for the Normal dsitribution > ---------------------------------------------- > > Key: STATISTICS-52 > URL: https://issues.apache.org/jira/browse/STATISTICS-52 > Project: Apache Commons Statistics > Issue Type: Improvement > Components: distribution > Affects Versions: 1.0 > Reporter: Alex Herbert > Priority: Minor > > The normal distribution PDF is computed using: > > {code:java} > Math.exp(-0.5 * x * x) / Math.sqrt(2 * Math.PI) > {code} > The value {{x^2}} can be computed to extended precision. This extra > information in the round-off bits can increase the accuracy of the > exponential function (see NUMBERS-177 under the title 'Accurate scaling by > exp(z*z)'). > > The effect of including the round-off bits on both accuracy and speed should > be investigated. > -- This message was sent by Atlassian Jira (v8.20.1#820001)