[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449039#comment-17449039 ] Alex Herbert commented on NUMBERS-174: -- New code for review in [PR 110|https://github.com/apache/commons-numbers/pull/110] Note: Currently the Travis CI integration with Github is not working for PRs. Travis will build master but does not pick up PRs. So this should be verified locally. I tried changing .travis.yml to update the JDK environment from openjdk9 to openjdk11 but this was not the cause since the master branch builds but not the PR branch. No other changes to the .travis.yml have been done since the last PR build 1 month ago ([Travis Commons Numbers PRs|https://app.travis-ci.com/github/apache/commons-numbers/pull_requests]). > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444012#comment-17444012 ] Alex Herbert commented on NUMBERS-174: -- OK. I'll put the code together as a PR for review. I can put the new code in one commit and then update the public API in a second. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444008#comment-17444008 ] Gilles Sadowski commented on NUMBERS-174: - bq. are you suggesting not exposing the computation of the incomplete gamma function? No. I just prefer {quote} {code} IncompleteGamma.Lower.value(double a, double x); {code} {quote} over {quote} {code} IncompleteGamma.lower(double a, double x); {code} {quote} bq. LanczosApproximation [...] could be marked as deprecated [...] +1 > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443993#comment-17443993 ] Alex Herbert commented on NUMBERS-174: -- {quote}but I would not define any other such functions (i.e. Lower.value(...) and Upper.value(...) {quote} So are you suggesting not exposing the computation of the incomplete gamma function? These functions do overflow with moderate arguments as the value of gamma(z) used for normalisation grows more than exponentially. Matlab does not have the standard functions. It has the incomplete gamma functions as the regularised versions. Then it provides scaled versions. See [Matlab gammainc|https://uk.mathworks.com/help/matlab/ref/gammainc.html]. {code:java} p(a, z) scaled = p(a, z) * gamma(a+1) e^z / x^a {code} "This scaling cancels out the asymptotic behavior of the function near 0, which avoids underflow with small arguments." Leaving out the standard incomplete gamma functions would add only the following to the API: {code:java} LogGamma.value(z, int[] sign); RegularizedGamma.P.derivative(double a, double x); RegularizedGamma.Q.derivative(double a, double x); /* Optional */ {code} The derivative is all that is required for the Gamma function to be used in the statistics project for the PDF and log PDF. I am fine with adding just this to the API. Then looking at the Complementary functions API. The usage of the Gamma functions in statistics requires the regularized gamma function with fixed a (Gamma, Nakagami) or fixed z (Poisson). Note that the RegularizedBeta class has a complement too which is not currently in the API (see NUMBERS-169). There is also this old ticket from Math regarding accuracy of the RegularizedBeta NUMBERS-52. So my next step was to look at porting the Boost beta function. This could also make used of the Complementary functions interface. The Boost beta functions have standard and regularised functions and the derivative of the regularised function so porting them should be able to use a similar process. See: [Boost incomplete beta|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_beta/ibeta_function.html] [Boost beta derivative|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_beta/beta_derivative.html] I also note that there is this ticket regarding the LanczosApproximation: [NUMBERS-47]. Since the class is only public for use in statistics then it could be marked as deprecated when statistics no longer depends on it and uses the new gamma function to compute the derivative. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. T
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443951#comment-17443951 ] Alex Herbert commented on NUMBERS-174: -- h2. Log of the Gamma P Derivative I tried this with a few methods: ||Method||Description|| |Logs|a * log(z) - log(z) - x - lgamma(a)| |Log(P Derivative)|log(Gamma P Derivative) if finite non zero; else use logs| |Logs (high precision sum)|a * log(z) - log(z) - x - lgamma(a) summed in high precision using BigDecimal| h3. Results ||Data||Logs|| ||Log(P derivative)|| ||Logs (high precision sum)|| || || ||max||RMS||max||RMS||max||RMS|| |LOG_GAMMA_P_DERIV1_SMALL|6.35E+11|4.15E+10|1.69E+10|1.20E+09|4.64E+11|3.56E+10| |LOG_GAMMA_P_DERIV1_INT|48.0818|8.37842|1.65059|0.384271|24.9378|5.83872| |LOG_GAMMA_P_DERIV1_MED|174.155|32.6245|5.94376|0.422254|153.955|28.5913| |LOG_GAMMA_P_DERIV1_BIG|1.02E+06|120858|37922.7|2787.94|1.01E+06|108370| The small data has some values very close to 0. These are not computed well by any method. The cancellation errors are significant. The error for 1e9 ulps is after about 23 bits of the 53-bit mantissa, or around 1e-9 relative error. The logs method is not as accurate as using the log of the standard result. This is especially true for the big data. Summing in high precision only marginally reduces the error. It is better to use the log of the derivative if that can be computed. I tried extracting all the logic from the method to compute the gamma P derivative into a method that directly returned the log result. This avoids a round trip of log(exp(result)). The method had the same accuracy but it would be faster. Creating a dedicated method is only required for performance reasons. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443864#comment-17443864 ] Gilles Sadowski commented on NUMBERS-174: - What would be the preferred API, considering the more common use case(s) for this family of functions? I mean, does it make sense to assume that the {{ComplementaryFunctions}} interface (cf. NUMBERS-167) would cover most needs? We could define an old-style _static_ function for all the new functions (since {{value}} is already part of the public API for the existing functionality), but I would not define any other such functions (i.e. {{Lower.value(...)}} and {{Upper.value(...)}} vs, respectively, {{upper(...)}} and {{lower(...)}} in {{{}IncompleteGamma{}}}). Wrt {{{}derivative{}}}, how about {code:java} public interface ComplementaryFunctions { double value(double x); double complement(double x); default double derivative(double x) { throw new UnsupportedOperationException(); } } {code} ? > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443805#comment-17443805 ] Alex Herbert commented on NUMBERS-174: -- h2. API Updates The following functions are currently in the public API: {code:java} Gamma.value(double x); LogGamma.value(double x); RegularizedGamma.P.value(double a, double x); RegularizedGamma.P.value(double a, double x, double epsilon, int maxIterations); RegularizedGamma.Q.value(double a, double x); RegularizedGamma.Q.value(double a, double x, double epsilon, int maxIterations); {code} These functions can be updated to call the Boost gamma implementation. The new implementation does support the epsilon and max iterations for series computations where applicable. In most cases the number of iterations is small (<100) or else other evaluation methods are used. It is not expected to require controlling the iterations but accuracy can be adjusted for performance. This is used in the Boost code to evaluate the CDF and the PDF together to relatively low accuracy for use in a fast gradient based inverse function. In the case of LogGamma the new implementation supports negative x values whereas the current one does not. This can be exposed using: {code:java} LogGamma.value(double x, int[] sign); {code} The function returns log(abs(gamma( x )). The sign parameter will be set on output to the sign of the gamma function result if the argument in a non zero length array. There are 3 new functions with suggested names: {code:java} IncompleteGamma.Lower.value(double a, double x); IncompleteGamma.Lower.value(double a, double x, double epsilon, int maxIterations); IncompleteGamma.Upper.value(double a, double x); IncompleteGamma.Upper.value(double a, double x, double epsilon, int maxIterations); RegularizedGamma.P.derivative(double a, double x); RegularizedGamma.Q.derivative(double a, double x); {code} Note that RegularizedGamma.Q.derivative(double a, double x) is the negation of RegularizedGamma.P.derivative(double a, double x). It can be omitted or added for completeness. Note that the current RegularizedGamma P and Q classes are static inner classes. These could be deprecated and replaced with: {code:java} IncompleteGamma.P.value(double a, double x); IncompleteGamma.P.value(double a, double x, double epsilon, int maxIterations); IncompleteGamma.Q.value(double a, double x); IncompleteGamma.Q.value(double a, double x, double epsilon, int maxIterations); {code} The IncompleteGamma could be an enum with 4 instances: Upper, Lower, P, Q. However using an enum eliminates the possibility to add methods to one instance of the enum. Thus 'P' could not have the additional method 'derivative(double a, double x)' which does not apply to all the other classes. The simpler approach is to maintain the RegularizedGamma class and add an IncompleteGamma class with Upper and Lower inner classes, or upper and lower methods: {code:java} IncompleteGamma.lower(double a, double x); IncompleteGamma.lower(double a, double x, double epsilon, int maxIterations); IncompleteGamma.upper(double a, double x); IncompleteGamma.upper(double a, double x, double epsilon, int maxIterations); {code} h2. Lanczos Approximation The current API exposes a Lanczos approximation. The Boost code has its own Lanczos approximation. The results of the gamma function are on average more accurate when using the Boost variant. The Boost variant also has two evaluation methods, one used for gamma(z) and one for logGamma(z). However the two approximations are not inter-changeable. A minor change is they use a different Lanczos constant g. A major change is some constant part of the multiplication factor is incorporated into the rational approximation for the Lanczos sum. Thus any code using the Lanczos sum and multiplying by a factor will have to update the factor: [Commons Numbers Lanczos v1.0|https://commons.apache.org/proper/commons-numbers/commons-numbers-gamma/javadocs/api-1.0/org/apache/commons/numbers/gamma/LanczosApproximation.html] {noformat} gamma(z+1) = sqrt(2*pi) * (z + g + 0.5)^(z + 0.5) * e^-(z + g + 0.5) * lanczos(z) gamma(z) = sqrt(2*pi) * (z + g + 0.5)^(z + 0.5) * e^-(z + g + 0.5) * lanczos(z) / z{noformat} [Boost Lanczos 1_77_0|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/lanczos.html] "Written in partial fraction form with the leading constants absorbed by the coefficients in the sum": {noformat} gamma(z) = (z + g - 0.5)^(z - 0.5) * e^-(z + g - 0.5) * lanczos(z){noformat} I would also estimate that the current implementation is faster based on the number of floating point operations. This is not confirmed. Options would be: # Expose the new version in a new API and optionally deprecate LanczosApproximation # Keep the LanczosApproximation and leave the Boost implementation package private h4. Usage of LanczosApproximation
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443765#comment-17443765 ] Alex Herbert commented on NUMBERS-174: -- h2. Using Dfp to increase precision The Dfp class provided arbitrary precision floating-point arithmetic (DFP = Decimal Floating Point library for Java). I have testing using this class with the version in Commons Math 3 to perform the computation of the regularised gamma prefix: {noformat} x^a e^-x / gamma(a){noformat} I used 64 digits of precision which is enough for the current test data. Results: ||Test||Standard|| ||Dfp|| ||Difference|| || || ||max||RMS||max||RMS||max||RMS|| |GAMMA_P_DERIV_INT|3.16382|0.974935|3.16382|0.974935|0|0| |GAMMA_P_DERIV_SMALL|3.12638|0.901515|3.12638|0.901515|0|0| |GAMMA_P_DERIV_MED|4.39499|1.00498|4.39499|1.00498|0|0| |GAMMA_P_DERIV_BIG|538.126|52.8391|2.43417|0.796512|-535.69183|-52.042588| |IGAMMA_P_INT|3.33453|0.837093|3.33453|0.837093|0|0| |IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|0|0| |IGAMMA_P_MED|4.31041|0.743446|4.31041|0.743446|0|0| |IGAMMA_P_BIG|412.953|31.3717|3.92196|0.786162|-409.03104|-30.585538| |IGAMMA_P_EXTRA|0.502906|0.12985|0.502906|0.12985|0|0| |IGAMMA_Q_INT|4.83677|1.13616|4.83677|1.13616|0|0| |IGAMMA_Q_SMALL|3.74672|0.961876|3.74672|0.961876|0|0| |IGAMMA_Q_MED|5.35148|0.82354|5.35148|0.82354|0|0| |IGAMMA_Q_BIG|537.101|42.6655|5.7472|0.917859|-531.3538|-41.747641| |IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|0|0| |IGAMMA_LARGE_X_ASYMP_TERM|276.954|103.997|276.954|103.997|0|0| |IGAMMA_LARGE_X_Q|537.101|120.441|1.0676|0.420528|-536.0334|-120.020472| |IGAMMA_LARGE_X_P|0.772756|0.229714|0.772756|0.229714|0|0| |IGAMMA_LARGE_X_Q_ASYM|537.101|170.353|344.395|119.68|-192.706|-50.673| |IGAMMA_LARGE_X_P_ASYM|341.213|98.6633|337.213|97.9287|-4|-0.7346| |IGAMMA_LOWER_INT|5.41081|1.07932|5.41081|1.07932|0|0| |IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474|0|0| |IGAMMA_LOWER_MED|5.99945|1.15425|5.99945|1.15425|0|0| |IGAMMA_LOWER_BIG|7.29508|1.1326|7.29508|1.1326|0|0| |IGAMMA_LOWER_EXTRA|4.45394|1.19886|4.45394|1.19886|0|0| |IGAMMA_LARGE_X_ASYMP_TERM|276.954|103.997|276.954|103.997|0|0| |IGAMMA_LARGE_X_Q|537.101|120.441|1.0676|0.420528|-536.0334|-120.020472| |IGAMMA_LARGE_X_P|0.772756|0.229714|0.772756|0.229714|0|0| |IGAMMA_LARGE_X_Q_ASYM|537.101|170.353|344.395|119.68|-192.706|-50.673| |IGAMMA_LARGE_X_P_ASYM|341.213|98.6633|337.213|97.9287|-4|-0.7346| |IGAMMA_UPPER_INT|5.1089|1.36073|5.1089|1.36073|0|0| |IGAMMA_UPPER_SMALL|2.2915|0.78016|2.2915|0.78016|0|0| |IGAMMA_UPPER_MED|8.2182|1.68102|8.2182|1.68102|0|0| |IGAMMA_UPPER_BIG|7.28961|1.1512|7.28961|1.1512|0|0| |IGAMMA_UPPER_EXTRA|11.1293|3.59124|11.1293|3.59124|0|0| Note that this change only affects the regularized incomplete gamma functions P and Q. In the case of the standard functions the result is typically 0 or infinity and so extended precision makes no difference. In cases where the error is due to cancellation in the power terms using the Dfp class eliminates this error. However there are some cases where the error is in the other part of the computation, for example the series summation for large x which can be computed via the asymptotic approximation (all cases using LARGE_X in the name). These errors could only be eliminated by using extended precision for the remaining parts of the computation. Using Dfp throughout the computation would have a big performance impact and require significant code changes. The cases where Dfp can improve the computation of the power terms are difficult to predict from the input arguments. For simplicity the method should use double precision and accept the loss of precision for large arguments. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442877#comment-17442877 ] Alex Herbert commented on NUMBERS-174: -- {quote}To be sure: Do you always copy those explanations over to the source files? {quote} Yes. But usually it works the other way as I start with comments in the source files, investigate and comment on what has been updated and why, then copy the findings to the Jira ticket. In this case the easy option would be to disable this evaluation method and effectively set the computation back to the Boost version from 1_68_0 (pre Dec 2018). But I wanted to find out if it could be used and when. I think there is a definite case for leaving the method in. But it should be used more conservatively. I will raise this finding as an issue with the Boost maintainers. I think the ported code is ready to merge, following some review. There is the outstanding issue of whether to merge with its own continued fraction implementation and then update it pending the resolution of NUMBERS-175. I also investigated using Dfp to compute the power functions in extended precision. I can post up results of that too. It works very well to eliminate errors. I have no idea on performance implications but I think it would be significant. It is probably best to leave it out and think of it as a possible future enhancement. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442831#comment-17442831 ] Gilles Sadowski commented on NUMBERS-174: - As usual: Thanks for all this work! To be sure: Do you always copy those explanations over to the source files? > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442702#comment-17442702 ] Alex Herbert commented on NUMBERS-174: -- h2. Incomplete Gamma evaluation for Large X I found a bug in the implementation when evaluating the incomplete gamma for certain parameters. When x is large then the incomplete gamma function has an asymptotic approximation: [Digital Library of Mathematical Functions: Incomplete Gamma: 8.11|https://dlmf.nist.gov/8.11#E2] This function is evaluated when: {noformat} (x > 1000) && ((a < x) || (Math.abs(a - 50) / x < 1)){noformat} The result is a summation of terms: {noformat} t_{n+1} = t_n * (a - n - 1) / z ;t_0 = 1{noformat} In theory this works if a < z or after a few iterations the term is less than 1. Eventually the term will reduce to zero. But if a ~ z then the term starts very close to 1. It takes a long time to reduce to zero especially if the values are big, e.g. this will not converge within a million iterations: a = 1e30, z = nextUp(1e30). The closer a is in magnitude to z then the longer this series takes to converge. The longer it takes to converge the worse the result due to the limited precision of the terms computed via recursion and accumulated rounding error. This method was added to address the issue of computing incorrect values when an extremely large x was used (see [Boost Math issue 168|https://github.com/boostorg/math/issues/168]), e.g. {noformat} gamma_p(22.5, 1e300){noformat} For this case the first term is zero and the series would compute the result as 1. However for this case the routine detects that the prefix for the regularised gamma function is 0 and does not even evaluate the series anyway. The routine to compute the series is only used when the prefix for the regularised gamma function is non-zero. This requires terms a and z to be similar magnitudes. However there already exists a function to compute the result when a and z are similar magnitudes. So removal of the asymptotic evaluation method improves the results on the current test suite. However there are situations where it is an advantage to use this method as it is faster than the alternative. So I have looked into how to preserve the method. In a simplification the term n can be approximated as: {noformat} term_n = (a/z)^n{noformat} This holds if a is extremely large as the subtraction of values from a will not change its magnitude very much. Thus the number of iterations is approximately: {noformat} n = log(term_n) / log(a/z){noformat} Since the series starts at 1 and increases then it will definitely stop when term_n is ulp(1.0). Setting term_n to ulp(1.0) = 2^-52 we get: ||a / z||a / z||Iterations|| |1 - 2^-1|0.5|52| |1 - 2^-2|0.75|125| |1 - 2^-3|0.875|270| |1 - 2^-4|0.9375|558| |1 - 2^-5|0.96875|1135| |1 - 2^-6|0.984375|2289| |1 - 2^-53|0.|3.25e17| So it is not advised to use this method using the condition a < z without further checks. I have set the condition to the far more conservative: {noformat} (x > 1000) && (a < x * 0.75){noformat} So for large values the series will compute at most approximately 125 iterations. Given the other evaluation methods that use a series have maximum iterations in the double digits then this seems an appropriate limit. When using this updated threshold the method is executed in the current test suite with the following iterations for: {noformat} 664.0791015625 1328.158203125 49 1169.2916259765625 2338.583251953125 50 2057.796630859375 4115.59326171875 51{noformat} These values for a / x ~ 0.5 match with the approximation for the number of iterations. For these cases the final result is the same or better than not using the asymptotic approximation. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > com
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442634#comment-17442634 ] Alex Herbert commented on NUMBERS-174: -- {quote}Wouldn't it be safer/cleaner to deprecate the {{ContinuedFraction}} class {quote} Ticket created NUMBERS-175. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441990#comment-17441990 ] Gilles Sadowski commented on NUMBERS-174: - bq. [...] possible API additions for ContinuedFraction Wouldn't it be safer/cleaner to deprecate the {{ContinuedFraction}} class, and define * RegularContinuedFraction * GeneralizedContinuedFraction based on the nomenclature used on the [Wikipedia article| https://en.wikipedia.org/wiki/Continued_fraction] ? > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441736#comment-17441736 ] Alex Herbert commented on NUMBERS-174: -- The Boost [continued fraction|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/internals/cf.html] documentation has some more information on the reason for two variations of the ContinuedFraction. {noformat} A: a1 -- b1 + a2 - b2 +a3 b3 + ... B: b0 +a1 -- b1 + a2 - b2 +a3 b3 + ... This latter form is somewhat more natural in that it corresponds with the usual definition of a continued fraction, but note that the first a value returned by the generator is discarded. Further, often the first a and b values in a continued fraction have different defining equations to the remaining terms, which may make the "_a" suffixed form more appropriate. {noformat} There are some examples given which all use continued fraction B with the exception of the upper incomplete gamma function Q. {noformat} In this case the first couple of terms are different from the rest, so our fraction will start with the first "regular" a term. {noformat} So the use of the two variations allows for special cases where computation of the leading terms is not part of a regular series. Note that the Boost code can also evaluate the function for a simple continued fraction where the a's are all 1: {noformat} A: 1 b0 + -- b1 + 1 - b2 +1 b3 + ... {noformat} This is done using some meta programming to allow the generator to return either a single or pair of values and then extracting a and b from the result. If the result is single then a is set to 1. A similar addition could be done for ContinuedFraction. I will raise a new ticket to discuss possible API additions for ContinuedFraction. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441457#comment-17441457 ] Gilles Sadowski commented on NUMBERS-174: - bq. Having used both I prefer the generator implementation. But it adds another implementation to maintain. Hence, I'd favour option 4 (and deprecate the current implementation). > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441449#comment-17441449 ] Alex Herbert commented on NUMBERS-174: -- {quote}Do I understand correctly that either public or internal (in the "gamma" module) there will be two implementations of the continued fraction functionality in [Numbers]? {quote} At the moment yes. I agree that this is not good. I originally wrote the new version so I could be sure it was working as per the Boost version. Now I have had time to revisit this to see if ContinuedFraction could be used. My first thought was if there is no real benefit in accuracy then pick the one that is faster. It helps that it matches the reference source code to allow easier maintenance in the future when referring to the original code. The other side is that if there is no real benefit in speed then reuse what we have and so any improvements will naturally be collected by dependent code. Note that the Boost code has two continued fraction implementations. One has a leading term b0. This matches the ContinuedFraction class. The other has no leading b0 term. This is the method used for gamma Q. The methods are continued_fraction_a and continued_fraction_b. Evidently the type of fraction to use will depend on the terms and the magnitude of b0 relative to the remaining terms that must converge. Rather than forcing the ContinuedFraction to perform this evaluation by generating zero for term b0 it was simpler to create an implementation that closely matched the Boost code. However now that I had time to look at the effect of using ContinuedFraction in several variations it is clear that although the current method of computation is not as accurate it can be used in a similar method to match the accuracy. In the interest of reducing code maintenance I can revise the code to use ContinuedFraction with notes on why the computation is done using that method. The performance improvement of the Boost method is not very large and it may even be possible to remove any benefit at all with a similar method in ContinuedFraction that does not use the term b0. Current ContinuedFraction has the API: {code:java} /* a1 * b0 + -- * b1 + a2 * - * b2 + a3 * * b3 + ... */ public abstract class ContinuedFraction { protected abstract double getA(int n, double x); protected abstract double getB(int n, double x); public double evaluate(double x, double epsilon); // ... Variations with iteration limits }{code} The implementing class only has to provide method to get terms a and b, retrieved by index. It does allow the fraction to be reused to compute for a different argument. Or the same argument again. But it does not lend itself to using recursion to generate terms from the previous terms. I do not know why this API was chosen. It existed in Commons Math 3 so was created a long time ago. The Boost version is done via a generator that creates a and b together to evaluate with or without the leading term. This is an API similar to: {code:java} /* a1 * [b0 +] -- * b1 + a2 * - * b2 + a3 * * b3 + ... */ public class ContinuedFraction { // Uses all terms provided by the generator. // generator assumed to start at a1. public static double evaluateA(Supplier cf, double epsilon); // Discards the first term a0 provided by the generator and uses b0 public static double evaluateB(Supplier cf, double epsilon); // ... Variations with iteration limits }{code} Each call to the supplier gets the next terms. The supplier cannot be reused. The evaluate method does not know anything about how the terms are created, only the convergence epsilon. The generator can be created using an anonymous class to hold the current terms and generate the next recursively. Having used both I prefer the generator implementation. But it adds another implementation to maintain. As suggested above the ContinuedFraction class can be extended to offer the alternative computation without the leading term b0. I've quickly done this and added it to the benchmark with two variations that pass the current test suite with the same accuracy. The results are: ||implementation||Method||Score||Median|| |Numbers 1.0|evaluate|80835.4|80474.2| |Numbers Modified|evaluate|79672.2|79622.7| |Numbers Extended A|evaluate|75785.0|75748.5| |Numbers Extended A1|evaluate|73590.3|73534.9| |Boost|evaluate|70280.4|70208.7| Here the modifications to the ContinuedFraction class make it faster than the modifications to the usage of the existing ContinuedFraction. It is not quite as fast as the Boost method but closer. Thus m
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441355#comment-17441355 ] Gilles Sadowski commented on NUMBERS-174: - {quote} I recommend the code should not be updated to use ContinuedFraction. It should use the ported Boost version of the modified continued fraction. The alternative is to provide a second continued fraction implementation in numbers that does not use the leading term b0. {quote} Do I understand correctly that either public or internal (in the "gamma" module) there will be two implementations of the continued fraction functionality in [Numbers]? > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441348#comment-17441348 ] Alex Herbert commented on NUMBERS-174: -- h2. ContinuedFraction The regularized incomplete Gamma Q evaluation uses a continued fraction. The Commons Numbers RegularizedGamma.Q method uses ContinuedFraction. The code implements a variation of this sum: {noformat} 1 -- b0 + a1 / (b1 + a2 ) - b2 + a3 b3 + ... b0 = z - a + 1 a1 = a - 1 {noformat} It can be done several ways: {noformat} Method 1: Commons Numbers RegularizedGamma.Q getA(n, x) : n * (a - n) getB(n, x) : ((2 * n) + 1) - a + x result = 1 / cf.evaluate(z) Method 2: Modified use of ContinuedFraction getA(n, x) : n * (a - n) getB(n, x) : n == 0 ? 0 : ((2 * n) + 1) - a + x result = 1 / (z - a + 1 + cf.evaluate(z)) Method 3: Updated use of ContinuedFraction / Boost continued fraction getA(n, x) : (n+1) * (a - n - 1) getB(n, x) : ((2 * (n+1)) + 1) - a + x result = 1 / (z - a + 1 + (a - 1) / cf.evaluate(z)) {noformat} h3. Accuracy The error for each method: ||Data||Numbers 1.0|| ||Modified|| ||Boost|| ||RMS Diff to Boost|| || || ||Max||RMS||Max||RMS||Max||RMS||Numbers 1.0||Modified|| |IGAMMA_P_INT|3.33453|0.837093|3.33453|0.837093|3.33453|0.837093|0|0| |IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|2.51022|0.707713|0|0| |IGAMMA_P_MED|4.31041|0.770837|4.31041|0.749316|4.31041|0.743446|0.027391|0.00587| |IGAMMA_P_BIG|412.953|48.2627|412.953|48.2623|412.953|48.2623|0.0004|0| |IGAMMA_P_EXTRA|0.502906|0.12985|0.502906|0.12985|0.502906|0.12985|0|0| |IGAMMA_Q_INT|5.46066|1.24445|4.83677|1.13616|4.83677|1.13616|0.10829|0| |IGAMMA_Q_SMALL|5.08353|1.18629|3.74672|0.961876|3.74672|0.961876|0.224414|0| |IGAMMA_Q_MED|6.81523|1.2909|5.35148|0.830843|5.35148|0.82354|0.46736|0.007303| |IGAMMA_Q_BIG|536.101|60.993|536.101|60.9895|536.101|60.9895|0.0035|0| |IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|54.6793|15.1705|0|0| |IGAMMA_LOWER_INT|5.41081|1.12354|5.41081|1.08145|5.41081|1.07932|0.04422|0.00213| |IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474|1.38617|0.472474|0|0| |IGAMMA_LOWER_MED|6.432|1.27844|5.69098|1.11614|5.99945|1.15425|0.12419|-0.03811| |IGAMMA_LOWER_BIG|7.29508|1.15197|7.29508|1.13324|7.29508|1.1326|0.01937|0.00064| |IGAMMA_LOWER_EXTRA|4.45394|1.19886|4.45394|1.19886|4.45394|1.19886|0|0| |IGAMMA_UPPER_INT|6.29133|1.55386|5.1089|1.34698|5.1089|1.36073|0.19313|-0.01375| |IGAMMA_UPPER_SMALL|4.90231|0.937614|2.2915|0.78016|2.2915|0.78016|0.157454|0| |IGAMMA_UPPER_MED|9.92377|2.16165|8.2182|1.65163|8.2182|1.68102|0.48063|-0.02939| |IGAMMA_UPPER_BIG|14.4738|1.58529|7.28961|1.07161|7.28961|1.1512|0.43409|-0.07959| |IGAMMA_UPPER_EXTRA|11.1293|3.59124|11.1293|3.59124|11.1293|3.59124|0|0| The original method in numbers is worse. The max and RMS error are higher than the Boost continued fraction method. An updated version using the ContinuedFraction implementation is sometimes better, sometimes worse than the Boost continued fraction. Overall it is marginally better but the differences are very small (these are measured in ULPs). Note: I implemented Method 3 using ContinuedFraction instead of the code ported from Boost. It had fractionally worse error for all test data. The continued fraction implementations are both using the same Lentz algorithm with some differences in the Boost method: # The tiny value is much smaller: 3.5601181736115222e-307 vs 1e-50. In practice this branch is never used on the entire suite of test data even using 1e-50 as the threshold. # There is no computation of the next value and checking it is not finite. The next value is computed without checks. Thus the ContinuedFraction class is more robust. However in this case the fraction is only evaluated when convergence is expected. # The Boost version never uses a max iterations and eps is set to the machine epsilon. The fraction should be evaluated when convergence is expected. The maximum iterations found in the test data was 51. The Boost code never checks the number of iterations. # The Boost termination is based on 'continue if (Math.abs(deltaN - 1) > epsilon' rather than 'stop if Math.abs(deltaN - 1) < epsilon'. Updating the Boost code to use 'continue if (Math.abs(deltaN - 1) *>=* epsilon' results in a maximum iterations found in the test data of 60 with +/- changes to the RMS error in the 3rd significant figure. So slower but no more accurate for the final result. h3. Speed I performed a JMH speed test of the methods using all cases from the test data that use the continued fraction evaluation. This has a mixture of data requiring max iterations from 5 to 62 for the modified ContinuedFraction method 2. ||implementation||Method||Score||Median|| |Numbers 1.0|evaluate|81397.7|80773.
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441330#comment-17441330 ] Alex Herbert commented on NUMBERS-174: -- h2. Improving the Regularised Gamma Function The majority of error in this function is in computation of the prefix term: {noformat} (z^a) (e^-z) / tgamma(a){noformat} Notes: # The tgamma(a) function is infinite above a ~ 171. # Computing tgamma(a) requires use of a power function and exponential and evaluation of the Lanczos approximation To reduce error in computing two power terms and two exponential terms the Boost code uses an optimisation that combines the terms and calls pow and exp only once. See [Boost igamma Equation 15|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]. This works well if evaluated using long double. When evaluated using double then there is some loss of precision with all the terms used in the rearrangement. I updated the code to detect when a <= 171, and when the pow and exp functions will not overflow/underflow. If these conditions are met then the direct evaluation can be performed. Assuming pow and exp are within 1 ULP then the majority of error is from tgamma(a). I added more data with gamma values from 150 to 171. Tests show that this function is accurate to around 4 ulp until the result overflows to infinity. So direct computation is valid until overflow conditions are reached. Here are the results for the previous code and the new version which uses the direct computation of the result when possible. ||Data||Previous|| ||New|| || || || || ||Max||RMS||Max||RMS||Max Diff||RMS Diff|| |GAMMA_P_DERIV_INT|22.4857|6.10843|3.16382|0.974935|-19.32188|-5.133495| |GAMMA_P_DERIV_SMALL|3.12638|0.901515|3.12638|0.901515|0|0| |GAMMA_P_DERIV_MED|57.2834|14.5155|4.39499|1.00498|-52.88841|-13.51052| |GAMMA_P_DERIV_BIG|538.126|52.8767|538.126|52.8391|0|-0.0376| |TGAMMAO_VERY_NEAR_0|3.16585|0.651873|3.16585|0.651873|0|0| |IGAMMA_P_INT|19.7402|4.41416|3.33453|0.837093|-16.40567|-3.577067| |IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|0|0| |IGAMMA_P_MED|57.4788|10.326|4.31041|0.743446|-53.16839|-9.582554| |IGAMMA_P_BIG|412.953|48.3047|412.953|48.2623|0|-0.0424| |IGAMMA_P_EXTRA|3.44946|1.00459|0.502906|0.12985|-2.946554|-0.87474| |IGAMMA_Q_INT|13.4607|2.1156|4.83677|1.13616|-8.62393|-0.97944| |IGAMMA_Q_SMALL|3.74672|0.961876|3.74672|0.961876|0|0| |IGAMMA_Q_MED|41.1248|5.71137|5.35148|0.82354|-35.77332|-4.88783| |IGAMMA_Q_BIG|536.101|60.9956|536.101|60.9895|0|-0.0061| |IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|0|0| Use of the direct computation reduces error. It will require an additional evaluation of pow and exp. But the alternative computation requires a sqrt and lots of branch checks to determine overflow conditions. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440819#comment-17440819 ] Alex Herbert commented on NUMBERS-174: -- {quote}Could [Numbers] provide an equivalent to Boost's "long double" computations? {quote} I do not know without trying. The exp function provides a 62-bit number. This is used by the pow function. It computes the power by using: {noformat} a^x = exp(x ln a){noformat} The log function also provides extended precision output for use by the exp function. So the log, pow and exp functions from AccurateMath would have to be updated to allow the extended precision output to be obtained. The result can then be computed using the 62-bit numbers. BigDecimal cannot be used. It does not have a non-integer exponent power function; exp or log. Addition of these elementary functions to BigDecimal would be nice. Dfp may be possible since it provides extended precision for the required elementary functions. Thanks for highlighting this. Use of that class would be worth investigating. Currently I am focussed on obtaining what is possible with standard double precision. I believe I have found a fix that computes to better accuracy for the 'int' and 'medium' data for the incomplete gamma functions. It is the 'big' data that cannot be computed without loss of precision. In this case it is more about the function gracefully failing to compute rather than catastrophically failing to compute. Better precision would be a work for the future. Time would be better invested initially by looking into porting the Beta function which also has some evaluation issues (see STATISTICS-25 and NUMBERS-169). > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440635#comment-17440635 ] Gilles Sadowski commented on NUMBERS-174: - It looks like major improvements are underway. :) {quote}Note that Commons Math 4 AccurateMath can compute these functions to 62 bit precision using a split double number. This is used internally for accurate function rounding. However the API is not exposed and the exp and pow results are limited to the 53-bit precision of a double. {quote} Could [Numbers] provide an equivalent to Boost's "long double" computations? Could the [{{Dfp}}|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=commons-math-legacy-core/src/main/java/org/apache/commons/math4/legacy/core/dfp/Dfp.java;h=02f7df81e38d533d893e96597a9dbbf2cf7f390c;hb=HEAD] and [related classes|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=commons-math-legacy-core/src/main/java/org/apache/commons/math4/legacy/core/dfp;h=c548c79c6e29a966e147994f6ad1857d71340285;hb=HEAD] be of use (and ported/refactored into [Numbers])? Alternatively, should some of the computations use {{BigDecimal}} internally in order to ensure accuracy equivalent to Boost's? Can this be done without "too much" loss in performance (e.g. in the domains where inaccuracies would show up in the {{double}} precision result)? > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440092#comment-17440092 ] Alex Herbert commented on NUMBERS-174: -- h2. Gamma P Derivative This function is not in the gamma package. It is used in statistics for the gamma distribution. This function is defined as: {noformat} f(a, z) = z^a * exp(-z) / Gamma(a) / a = z^(a - 1) * exp(-z) / Gamma(a){noformat} It can be implemented with logs: {noformat} f(a, z) = exp(a * log(z) - z - logGamma(a) - log(z)){noformat} The code in Commons Statistics attempts to compute this with high accuracy but the choice of function can be incorrect (see STATISTICS-39). I created derivative data using the same a and z values in the Boost datasets for the incomplete gamma functions. The result was computed using maxima to 64 digits. The simple implementation using logs was tested against the function from Commons Stats and the new Boost implementation. ||Data||Log|| ||Stats|| ||Boost|| || || ||Max||RMS||Max||RMS||Max||RMS|| |GAMMA_P_DERIV_INT|221.051|44.0093|88.5116|8.13219|22.4857|6.10843| |GAMMA_P_DERIV_SMALL|104.312|20.5646|452.647|67.9785|3.12638|0.901515| |GAMMA_P_DERIV_MED|476.683|128.642|66.1587|3.14963|57.2834|14.5155| |GAMMA_P_DERIV_BIG|1.01E+07|1.02E+06|555.477|62.9605|538.126|52.8767| h3. Int Data The use of logs is worse than the stats or Boost method. The Boost method has much lower max error than the stats method but the RMS is similar. h3. Medium Data The use of logs is worse than the stats or Boost method. The Boost method has lower max error than the stats method but the RMS is worse for Boost. This can be traced to the following errors above 50 ulp: {noformat} stats: GAMMA_P_DERIV_MED a=0.9759566783905029, z=0.009759566746652126: expected <1.091103827394005722772134927295910273305494319535696726782190113> != actual <1.091103827393991> (ulps=-66.15874662180883) boost: GAMMA_P_DERIV_MED a=79.829345703125, z=0.7982934713363647: expected <2.057948567669980623962755091227670264152698891095754996007072457E-125> != actual <2.0579485676699656E-125> (ulps=-57.28338674936721) GAMMA_P_DERIV_MED a=84.98836517333984, z=0.8498836755752563: expected <1.584723140811257432756401992975844384590043887470012851849659925E-133> != actual <1.5847231408112464E-133> (ulps=-56.62359924370867) {noformat} So the Boost method has two occurrences where it fails to compute as accurately. It has a different method when z < 1 which computes: {noformat} Math.pow(z, a) * Math.exp(-z) / tgamma(a) / a {noformat} If changed to: {noformat} Math.pow(z, a - 1) * Math.exp(-z) / tgamma(a) {noformat} The errors above 50 ulp are eliminated: {noformat} GAMMA_P_DERIV_MED max45.9872 RMS11.8938 {noformat} This fix does not work for all parameters. The method should not be changed without further investigation of appropriate limits to switch computations. h3. Small Data Direct use of logs performs better on this data than the method from statistics. This is data that demonstrates the incorrect choice of evaluation method as discussed in STATISTICS-39. The Boost method has far superior performance on this data as it correctly chooses to compute the direct formula (Math.pow(z, a) * Math.exp(-z) / tgamma(a) / a) rather than using logs or alternative computations with the Lanczos approximation. h3. Big Data Use of logs here is incorrect. Both the stats and Boost method are using an appropriate rearrangement of the computation to avoid cancellation errors. The stats method is based on the documentation for the Boost function. The difference may be due to the use of the computation log1p(x) - x which has a special method in Boost (log1pmx(x)) to compute this without loss of accuracy when the argument x is close to 0. h2. Overall The method using logs is outperformed by other methods. This is the subject of MATH-753. Using logs can result in cancellation for some parameters where alternative computation is superior. The method from statistics is successful in computing the result for large z. It fails when z is small. The Boost method is robust across all data. It would be recommended to expose it through a public API. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440014#comment-17440014 ] Alex Herbert commented on NUMBERS-174: -- h2. Incomplete Gamma(a, z) ||Data||Modified|| ||Boost|| || || ||Max||RMS||Max||RMS|| |IGAMMA_LOWER_INT|3.7574|0.95297|5.41081|1.07932| |IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474| |IGAMMA_LOWER_MED|8.18929|1.36357|5.99945|1.15425| |IGAMMA_LOWER_BIG|7.29508|1.17811|7.29508|1.1326| |IGAMMA_UPPER_INT|6.92561|1.50143|5.1089|1.36073| |IGAMMA_UPPER_SMALL|2.2915|0.78016|2.2915|0.78016| |IGAMMA_UPPER_MED|9.0427|1.70797|8.2182|1.68102| |IGAMMA_UPPER_BIG|6.49026|1.06544|7.28961|1.1512| h3. Notes This function is not implemented in the current numbers gamma library. It can be done by multiplying the value P or Q by gamma(a). When a > 171 or close to zero then the result is likely to be infinite. For reference I updated the Boost function to call the current numbers implementations for gamma and log gamma. h3. Performance The functions compute accurately for each dataset. When the current implementation for gamma and log gamma is used (Modified) then the performance is very similar. There are a few cases where the Boost function is worse using its own gamma functions. The worst case for IGAMMA_LOWER_INT is: {noformat} tgammaLower(20.5, 20.5){noformat} The result is computed by multiplying a series term by the value of gamma(20.5). The result of this function is compared to the result from maxima to 64 digits: {noformat} gamma(20.5) maxima 5.406242982335075044736873647808221412273166395781714789601948262e17 BOOST 5.4062429823350765e17 (2 ulp) current 5.4062429823350726E17 (-4 ulp){noformat} In this case the Boost function is more accurate. The error is in the term it is multiplied by. This term is too big. Thus when multiplied by the commons numbers gamma result the two errors cancel and the final result is closer to the expected answer. The worst case for IGAMMA_UPPER_BIG is: {noformat} tgamma(95.06404113769531, 95.06404113769531); {noformat} Compute p via a lower gamma series multiplied by incompleteGammaPrefix: 5.6549889907651e146. In this case the series is evaluated using an optimisation to reduce the number of iterations. {noformat} initValue = -a * tgamma(a) / fullIgammaPrefix(a, x) {noformat} tgamma(a) is used for the initial value. Again the terms can be checked against the true answer: {noformat} gamma(95.06404113769531) maxima 1.455104768555175739015723834289112471039232295182085666150559959e146 BOOST 1.455104768555176E146 (2 ulp) current 1.4551047685551754E146 (-1 ulp) {noformat} So this is a case where the current implementation is closer to the true result for the gamma(z) function. Checking cases for the worst errors shows that the final accuracy can be dependent on a few ULP differences in evaluation of the gamma function. Given that the Boost gamma function is overall marginally more accurate when tested against a range of data for z > 20 this is not a difference that is considered significant. It would be recommended to keep the Boost gamma implementation. h2. Overall This function is accurate when the result is computable (i.e. not infinite). It would be recommended to expose it through a public API. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in >
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440011#comment-17440011 ] Alex Herbert commented on NUMBERS-174: -- h2. Regularised Incomplete Gamma(a, z) : P and Q ||Data||Current|| ||Boost|| || || ||Max||RMS||Max||RMS|| |IGAMMA_P_INT|255.083|39.2453|19.7402|4.41416| |IGAMMA_P_SMALL|32.9236|11.2031|2.51022|0.707713| |IGAMMA_P_MED|963.153|163.467|57.4788|10.326| |IGAMMA_P_BIG|1.63E+07|1.14E+06|412.953|48.3047| |IGAMMA_Q_INT|300.539|48.5842|13.4607|2.1156| |IGAMMA_Q_SMALL|5.66E+11|7.27E+10|3.74672|0.961876| |IGAMMA_Q_MED|1071.29|176.451|41.1248|5.71137| |IGAMMA_Q_BIG|1.63E+07|1.25E+06|536.101|60.9956| h3. Notes The regularised incomplete gamma functions have a value between 0 and 1. They are the incomplete gamma function divided by gamma(a). When a > 171 then gamma(a) is infinite and the normalisation uses log gamma. The functions P and Q can be expressed as a series (P) or a continued fraction (Q). There are other methods of evaluation too. The current implementation always the series or continued fraction and uses log gamma for normalisation. This may result is inaccuracy if gamma(z) is computable. The Boost implementation has special handling to use different computations for different ranges of the arguments. It normalises without logs if possible. It avoids the series and continued fractions when the result is close to 0.5 and the series are slow to converge. It has special handling for large arguments to attempt to compute what is possible given the precision of the floating-point type. h3. Int Data The current function computes reasonable accuracy for this data. The Boost function is an order of magnitude more accurate. h3. Small Data The current function computes reasonable accuracy for this data for P. The Boost function is an order of magnitude more accurate. The current function has very large max errors for Q. This is for data approaching q=0. The Boost function maintains accuracy for this data. h3. Medium Data The current function has moderate accuracy for this data. It has no obvious failings with very large max error. The Boost function is again over an order of magnitude more accurate. h3. Big Data The current function fails dramatically on this data. The Boost function is reasonable. I have tracked the Boost error for this data for the worst case. The function is: {noformat} gamma_q(a=2057.796630859375, z=4115.59326171875){noformat} In this case the function must evaluate a power term to be multiplied by a series value: {noformat} Math.pow(z, a) * Math.exp(-z) {noformat} This term is subject to large cancellation errors. Without special handling the power function overflows and the exp function underflows. If replaced with the exact value computed using extended precision then the result of the gamma_q function on these arguments is correct. Thus the default Boost policy to evaluate in long double for a double result handles this data. A long double typically has a 15 bit exponent for a maximum value of 2{^}+/-16383{^} and a 60-bit mantissa. This can handle these values for the power and exp term without cancellation. When evaluated using 53-bit precision this function is at the limit of what can be computed. The only way to improve accuracy for the Boost implementation in this case is an extended precision pow and exp function. Note that Commons Math 4 AccurateMath can compute these functions to 62 bit precision using a split double number. This is used internally for accurate function rounding. However the API is not exposed and the exp and pow results are limited to the 53-bit precision of a double. h2. Overall The Boost function is superior for all data. It is particularly robust to extreme inputs. The error for these arguments is due to the limited precision of the power and exponential functions. This would recommend switching to the Boost function. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideal
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440007#comment-17440007 ] Alex Herbert commented on NUMBERS-174: -- h2. Log Gamma(z) ||Data||Current|| ||Boost|| || || ||Max||RMS||Max||RMS|| |LGAMMA_FACTORIALS|3.32911|0.8279|0.645222|0.0682534| |LGAMMA_NEAR_0|1.09044|0.452423|1.09044|0.452423| |LGAMMA_NEAR_1|1.73052|0.554198|1.43922|0.577705| |LGAMMA_NEAR_2|1.70538|0.681634|0.551573|0.120172| |LGAMMA_NEAR_M10|5.59972|1.72064|2.89892|0.850624| |LGAMMA_NEAR_M55|1.74026|0.729566|0.740256|0.352296| |LGAMMA_M20_0|156.708|13.3007|92.708|7.90596| |LGAMMA_0_20|2.14863|0.76859|0.844097|0.188636| |LGAMMA_20_150|2.04684|0.693322|1.28226|0.331251| |LGAMMA_VERY_NEAR_0|1.63002|0.416674|1.61922|0.271574| h3. Notes Negative arguments require special handling. For the current implementation I added the functionality from Boost so the function returns a result for negative z. The current function computes using different methods for z in [0, 0.5), [0.5, 2.5], (2.5, 8], (8, inf). The final range uses the Lanczos approximation. The Boost function computes using different methods for z in [0, 1.5e-8), z < 15, z < 100, z >= 100. For z < 15 there is a dedicated function for small z which computes accurately at the roots for z=1, z=2. For z < 100 the function computes the log of tgamma(z). Above z=100 the Boost function used a different set of coefficient for the Lanczos approximation for log gamma. The coefficients have been divided by the term exp(g) where g is the Lanczos coefficient. This avoids having to subtract log(g) during the computation. h2. Overall The two functions are equivalent near 0. Very near zero the Boost function is better. On the other data the Boost function is more accurate with the exception of a slightly higher RMS error for the near 1 dataset. The max error on this dataset is lower. This would recommend switching to the Boost function. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > implement all the suggested optimisations detailed in the most recent > Boost documentation. This includes avoiding using the converging > series of the gamma function when the convergence is slow or unstable, > i.e. addresses the need for a configurable convergence threshold in > the Poisson distribution. > One part of the evaluation of the incomplete gamma functions require > accuracy in the leading term: > {noformat} > x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat} > When x and a are large then using logs to compute this leads to > cancellation. Use of logs to compute this term is done in the current > implementation in numbers for RegularizedGamma P and Q. It is the > source of low precision for the CDF for the Poisson distribution when > the mean is large. > I propose to port the Boost gamma functions to numbers. This will be a > process similar to the recent port of the error functions to the same > package in numbers. These functions improved both accuracy and > speed over the existing implementation. Once the gamma functions are > ported a comparison can be made between the existing and new > implementations. > [1] > [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439960#comment-17439960 ] Alex Herbert commented on NUMBERS-174: -- h2. Gamma(z) ||Data||Current|| ||Modified|| ||Boost|| || || ||Max||RMS||Max||RMS||Max||RMS|| |TGAMMA_FACTORIALS|3.89345|1.08579| | |2.2426|0.619793| |TGAMMA_NEAR_0|1.46384|0.550829| | |3.16585|1.10525| |TGAMMA_NEAR_1|1.10029|0.573537| | |3.10029|1.00016| |TGAMMA_NEAR_2|0.985467|0.413246| | |2.73141|1.10202| |TGAMMA_NEAR_M10|1.56073|0.534165| | |2.37076|1.01606| |TGAMMA_NEAR_M55|7.08E+06|1.29E+06|2.76601|0.945008|2.2891|1.08438| |TGAMMA_M20_0|2.7161|0.74157| | |4.35554|1.26587| |TGAMMA_0_20|1.86447|0.54758| | |2.97733|1.09463| |TGAMMA_20_150|Infinity|Infinity|5.77643|1.8674|3.35312|0.994272| |TGAMMA_VERY_NEAR_0|3.30574|0.567205| | |3.30574|0.567205| h3. Notes Both implementations switch to using a [Lanczos approximation|https://www.boost.org/doc/libs/1_64_0/libs/math/doc/html/math_toolkit/lanczos.html] at abs(z) > 20. The Boost version also uses this approximation below 20. Negative values are shifted into the positive domain using recursion properties of the gamma function. Very close to zero (z < 1.5e-8) there is a switch to gamma(z) = 1/z - E where E is Euler's constant. This data is tested using TGAMMA_VERY_NEAR_0. h3. |z| < 20 The current implementation of the gamma function is more accurate than the Boost version when |z| < 20. The Boost version uses the Lanczos approximation here whereas the current implementation using the NSWC Library of Mathematics Subroutines has a dedicated routine suitable for double precision. Note that the default policy in Boost is to promote double to long double for computations. I verified that if this is performed then the C++ Boost version has zero error. Thus the implementation is partly relying on having extended precision to act as guard digits during evaluation of the Lanczos approximation at small z. h3. |z| very small Both implementations are equal when |z| approaches 0. h3. z > 20 This uses the Lanczos approximation. This value is then multiplied by power terms. In the current code this is: {noformat} pow(z + g + 0.5, z + 0.5) / exp(z + g + 0.5) g = Lanczos constant (about 4.74) {noformat} These terms can be very large and there is no detection of potential overflow here. Thus the current code has infinite error when overflow begins. In the modified column I have rerun the code with no values that cause overflow (z > 141.5). In this case the Boost code is still more accurate. The Boost code has overflow detection and a modification to compute the power terms with a rearrangement. It can compute gamma(z) up to the limit of z at around 172. h3. Negative z The result for negative z is {noformat} result = -pi / z * sin(pi * z) * gamma(|z|) {noformat} The term {{z * sin(pi * z)}} must be evaluated correctly. If not then you get very large errors as shown for the M55 case for the current code. In the modified column I updated the code to use the Boost sinpx function. Here the error reduces to close to the error of the Boost version. h3. Factorials The Boost implementation uses precomputed values of the factorial for all integers. This gives it an advantage on this data. I did not rerun the test using only half-integers to eliminate this advantage. h2. Overall The current implementation is very accurate. It is better than the Boost implementation at z < 20. At large |z| the Boost implementation is marginally better across the range of tested values (20 - 150). It handles z up to the limit of the gamma function and has special evaluation of z * sin(pi * z) for negative z. This would recommend switching to a hybrid function combining the current implementation for |z| < 20 with the Boost for |z| > 20. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a thres
[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation
[ https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439909#comment-17439909 ] Alex Herbert commented on NUMBERS-174: -- I have ported a set of Gamma functions from Boost C++ to Java: ||Function||Description||Present in Commons Numbers gamma package|| |tgamma(z)|'True' gamma function|Y. Gamma.value(z)| |lgamma(z)|Log of the gamma function|Y. LogGamma.value(z) Support in numbers is only for x > 0. The Boost function adds support for negative numbers (where gamma(z) can be positive or negative) by returning the log of the absolute value of the gamma function of z: log(abs(gamma(z))). The sign of the gamma(z) result is returned as an output.| |tgamma(a, z)|Incomplete upper gamma function|N| |tgamma_lower(a, z)|Incomplete lower gamma function|N| |gammaP|Regularised incomplete lower gamma function|Y. RegularizedGamma.P.value(z)| |gammaQ|Regularised incomplete upper gamma function|Y. RegularizedGamma.Q.value(z)| |gammaPDerivative|Derivative of the incomplete lower gamma function|N. This is the probability density function of the Gamma distribution. There is an implementation in Commons statistics.| h2. Test Data The Boost C++ code has test files which I have used and added to with additional data generated using maxima: h3. Gamma Data ||Data||Source||Description|| |Gamma factorials|Boost|Integers and half-integers in the range 1 to 99.5| |Gamma near 0|Boost|z in [-0.5, 0.5]| |Gamma near 1|Boost|z in [0.5, 1.5]| |Gamma near 2|Boost|z in [1.5, 2.5]| |Gamma near -10|Boost|z around the negative pole at -10| |Gamma near -55|Boost|z around the negative pole at -55| |Gamma 20 - 0|maxima|z in [-20, 0] in intervals of 0.125 excluding integers| |Gamma 0 - 20|maxima|z in [0, 20] in intervals of 0.125 excluding integers| |Gamma 20 - 150|maxima|z in [20.75, 150.25] in intervals of 0.5| |Gamma very near 0|maxima|z in +/-[2^-14, 2^-60] in powers of 2| Note the gamma evaluation uses a Lanczos approximation when z > 20. Evaluation for negative values requires additional computation. Thus extra datasets targeting -20 to 0, 0 to 20 and above 20 have been created to test the Boost implementation against the current implementation with and without Lanczos support. The limit for gamma(z) is approximately 171. Note that gamma(z) = (n-1)! and the maximum representable factorial as a double is 170. h3. Incomplete Gamma Data ||Data||Source||Description|| |igamma int|Boost|a in [0.5, 37] integer of half-integer; z chosen to test the extremes of the p and q result at 0, 0.5, and 1.0| |igamma small|Boost|a in [0.165e-11, 0.05], z chosen to test the range of the p and q result at 0, 0.5, and 1.0| |igamma medium|Boost|a in [0.97, 99.65], z chosen to test the range of the p and q result at 0, 0.5, and 1.0| |igamma big|Boost|a in [0.15e-5, 1736170], z chosen to test the extremes of the p and q result at 0, 0.5, and 1.0. Many expected results in this dataset are zero or infinity as a double.| |gamma p deriv|maxima|a and z in [0.25 + 2^i / 8] with i in [0, 13]; range 0.375 to 1024.25| h2. Results Analysis Results files have approximately 50 to 300 data points. Expected results are evaluated to at least 64 significant figures and compared to the actual computed values using BigDecimal to obtain fractional ULP differences. Any computed result value below 0.5 ulp from the expected result is exact. Otherwise the ulp difference is used to compute the maximum error and the root mean square error (RMS). Results for each function will follow in separate posts. > Update Gamma functions using the Boost implementation > - > > Key: NUMBERS-174 > URL: https://issues.apache.org/jira/browse/NUMBERS-174 > Project: Commons Numbers > Issue Type: Improvement > Components: gamma >Affects Versions: 1.0 >Reporter: Alex Herbert >Assignee: Alex Herbert >Priority: Major > Fix For: 1.1 > > > The current regularised incomplete gamma functions for P and Q compute using > a series representation for all input. This method is not robust to extreme > arguments. > The Gamma functions are used in Commons Statistics in the Gamma and Poisson > distributions. > Large values of the mean in the Poisson distribution have low > precision for the CDF. The distribution also has a configurable > threshold for convergence of the gamma function evaluation > (STATISTICS-38). Ideally this implementation detail should be > removed from the API. > The Gamma distribution has a function switch based on a threshold to > compute the PDF. This threshold results in incorrect values for > certain parameters (STATISTICS-39). > The function switch in the Gamma distribution is based on the > documentation for the Boost gamma functions [1]. However it does not > impl