[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-25 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449039#comment-17449039
 ] 

Alex Herbert commented on NUMBERS-174:
--

New code for review in [PR 
110|https://github.com/apache/commons-numbers/pull/110]

Note: Currently the Travis CI integration with Github is not working for PRs. 
Travis will build master but does not pick up PRs. So this should be verified 
locally.

I tried changing .travis.yml to update the JDK environment from openjdk9 to 
openjdk11 but this was not the cause since the master branch builds but not the 
PR branch. No other changes to the .travis.yml have been done since the last PR 
build 1 month ago ([Travis Commons Numbers 
PRs|https://app.travis-ci.com/github/apache/commons-numbers/pull_requests]).

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444012#comment-17444012
 ] 

Alex Herbert commented on NUMBERS-174:
--

OK. I'll put the code together as a PR for review. I can put the new code in 
one commit and then update the public API in a second.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444008#comment-17444008
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

bq. are you suggesting not exposing the computation of the incomplete gamma 
function?

No.  I just prefer
{quote}
{code}
IncompleteGamma.Lower.value(double a, double x);
{code}
{quote}
over
{quote}
{code}
IncompleteGamma.lower(double a, double x);
{code}
{quote}

bq. LanczosApproximation [...] could be marked as deprecated [...]

+1

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443993#comment-17443993
 ] 

Alex Herbert commented on NUMBERS-174:
--

{quote}but I would not define any other such functions (i.e. Lower.value(...) 
and Upper.value(...)
{quote}
So are you suggesting not exposing the computation of the incomplete gamma 
function?

These functions do overflow with moderate arguments as the value of gamma(z) 
used for normalisation grows more than exponentially. Matlab does not have the 
standard functions. It has the incomplete gamma functions as the regularised 
versions. Then it provides scaled versions. See [Matlab 
gammainc|https://uk.mathworks.com/help/matlab/ref/gammainc.html].
{code:java}
p(a, z) scaled = p(a, z) * gamma(a+1) e^z / x^a
{code}
"This scaling cancels out the asymptotic behavior of the function near 0, which 
avoids underflow with small arguments."

Leaving out the standard incomplete gamma functions would add only the 
following to the API:
{code:java}
LogGamma.value(z, int[] sign);

RegularizedGamma.P.derivative(double a, double x);
RegularizedGamma.Q.derivative(double a, double x); /* Optional */
{code}
The derivative is all that is required for the Gamma function to be used in the 
statistics project for the PDF and log PDF.

I am fine with adding just this to the API. Then looking at the Complementary 
functions API. The usage of the Gamma functions in statistics requires the 
regularized gamma function with fixed a (Gamma, Nakagami) or fixed z (Poisson).

Note that the RegularizedBeta class has a complement too which is not currently 
in the API (see NUMBERS-169). There is also this old ticket from Math regarding 
accuracy of the RegularizedBeta NUMBERS-52. So my next step was to look at 
porting the Boost beta function. This could also make used of the Complementary 
functions interface. The Boost beta functions have standard and regularised 
functions and the derivative of the regularised function so porting them should 
be able to use a similar process.

See:

[Boost incomplete 
beta|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_beta/ibeta_function.html]

[Boost beta 
derivative|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_beta/beta_derivative.html]

I also note that there is this ticket regarding the LanczosApproximation: 
[NUMBERS-47]. Since the class is only public for use in statistics then it 
could be marked as deprecated when statistics no longer depends on it and uses 
the new gamma function to compute the derivative.

 

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. T

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443951#comment-17443951
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Log of the Gamma P Derivative

I tried this with a few methods:

 
||Method||Description||
|Logs|a * log(z) - log(z) - x - lgamma(a)|
|Log(P Derivative)|log(Gamma P Derivative) if finite non zero; else use logs|
|Logs (high precision sum)|a * log(z) - log(z) - x - lgamma(a) summed in high 
precision using BigDecimal|
h3. Results

 
||Data||Logs|| ||Log(P derivative)|| ||Logs (high precision sum)|| ||
|| ||max||RMS||max||RMS||max||RMS||
|LOG_GAMMA_P_DERIV1_SMALL|6.35E+11|4.15E+10|1.69E+10|1.20E+09|4.64E+11|3.56E+10|
|LOG_GAMMA_P_DERIV1_INT|48.0818|8.37842|1.65059|0.384271|24.9378|5.83872|
|LOG_GAMMA_P_DERIV1_MED|174.155|32.6245|5.94376|0.422254|153.955|28.5913|
|LOG_GAMMA_P_DERIV1_BIG|1.02E+06|120858|37922.7|2787.94|1.01E+06|108370|

The small data has some values very close to 0. These are not computed well by 
any method. The cancellation errors are significant. The error for 1e9 ulps is 
after about 23 bits of the 53-bit mantissa, or around 1e-9 relative error.

The logs method is not as accurate as using the log of the standard result. 
This is especially true for the big data. Summing in high precision only 
marginally reduces the error.

It is better to use the log of the derivative if that can be computed.

I tried extracting all the logic from the method to compute the gamma P 
derivative into a method that directly returned the log result. This avoids a 
round trip of log(exp(result)). The method had the same accuracy but it would 
be faster. Creating a dedicated method is only required for performance reasons.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443864#comment-17443864
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

What would be the preferred API, considering the more common use case(s) for 
this family of functions?
I mean, does it make sense to assume that the {{ComplementaryFunctions}} 
interface (cf. NUMBERS-167) would cover most needs?
We could define an old-style _static_ function for all the new functions (since 
{{value}} is already part of the public API for the existing functionality), 
but I would not define any other such functions (i.e. {{Lower.value(...)}} and 
{{Upper.value(...)}} vs, respectively, {{upper(...)}} and {{lower(...)}} in 
{{{}IncompleteGamma{}}}).
Wrt {{{}derivative{}}}, how about
{code:java}
public interface ComplementaryFunctions {
double value(double x);
double complement(double x);

default double derivative(double x) {
throw new UnsupportedOperationException();
}
}
{code}
?

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443805#comment-17443805
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. API Updates

The following functions are currently in the public API:
{code:java}
Gamma.value(double x);

LogGamma.value(double x);

RegularizedGamma.P.value(double a, double x);
RegularizedGamma.P.value(double a, double x, double epsilon, int maxIterations);

RegularizedGamma.Q.value(double a, double x);
RegularizedGamma.Q.value(double a, double x, double epsilon, int maxIterations);
{code}
These functions can be updated to call the Boost gamma implementation. The new 
implementation does support the epsilon and max iterations for series 
computations where applicable. In most cases the number of iterations is small 
(<100) or else other evaluation methods are used. It is not expected to require 
controlling the iterations but accuracy can be adjusted for performance. This 
is used in the Boost code to evaluate the CDF and the PDF together to 
relatively low accuracy for use in a fast gradient based inverse function.

In the case of LogGamma the new implementation supports negative x values 
whereas the current one does not. This can be exposed using:
{code:java}
LogGamma.value(double x, int[] sign);
{code}
The function returns log(abs(gamma( x )). The sign parameter will be set on 
output to the sign of the gamma function result if the argument in a non zero 
length array.

There are 3 new functions with suggested names:
{code:java}
IncompleteGamma.Lower.value(double a, double x);
IncompleteGamma.Lower.value(double a, double x, double epsilon, int 
maxIterations);

IncompleteGamma.Upper.value(double a, double x);
IncompleteGamma.Upper.value(double a, double x, double epsilon, int 
maxIterations);

RegularizedGamma.P.derivative(double a, double x);
RegularizedGamma.Q.derivative(double a, double x); {code}
Note that RegularizedGamma.Q.derivative(double a, double x) is the negation of 
RegularizedGamma.P.derivative(double a, double x). It can be omitted or added 
for completeness.

Note that the current RegularizedGamma P and Q classes are static inner 
classes. These could be deprecated and replaced with:
{code:java}
IncompleteGamma.P.value(double a, double x);
IncompleteGamma.P.value(double a, double x, double epsilon, int maxIterations);

IncompleteGamma.Q.value(double a, double x);
IncompleteGamma.Q.value(double a, double x, double epsilon, int maxIterations);
{code}
The IncompleteGamma could be an enum with 4 instances: Upper, Lower, P, Q. 
However using an enum eliminates the possibility to add methods to one instance 
of the enum. Thus 'P' could not have the additional method 'derivative(double 
a, double x)' which does not apply to all the other classes.

The simpler approach is to maintain the RegularizedGamma class and add an 
IncompleteGamma class with Upper and Lower inner classes, or upper and lower 
methods:
{code:java}
IncompleteGamma.lower(double a, double x);
IncompleteGamma.lower(double a, double x, double epsilon, int maxIterations);

IncompleteGamma.upper(double a, double x);
IncompleteGamma.upper(double a, double x, double epsilon, int maxIterations);
{code}
h2. Lanczos Approximation

The current API exposes a Lanczos approximation. The Boost code has its own 
Lanczos approximation. The results of the gamma function are on average more 
accurate when using the Boost variant. The Boost variant also has two 
evaluation methods, one used for gamma(z) and one for logGamma(z).

However the two approximations are not inter-changeable. A minor change is they 
use a different Lanczos constant g. A major change is some constant part of the 
multiplication factor is incorporated into the rational approximation for the 
Lanczos sum. Thus any code using the Lanczos sum and multiplying by a factor 
will have to update the factor:

[Commons Numbers Lanczos 
v1.0|https://commons.apache.org/proper/commons-numbers/commons-numbers-gamma/javadocs/api-1.0/org/apache/commons/numbers/gamma/LanczosApproximation.html]
{noformat}
gamma(z+1) = sqrt(2*pi) * (z + g + 0.5)^(z + 0.5) * e^-(z + g + 0.5) * 
lanczos(z)
gamma(z)   = sqrt(2*pi) * (z + g + 0.5)^(z + 0.5) * e^-(z + g + 0.5) * 
lanczos(z) / z{noformat}
[Boost Lanczos 
1_77_0|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/lanczos.html]

"Written in partial fraction form with the leading constants absorbed by the 
coefficients in the sum":
{noformat}
gamma(z) = (z + g - 0.5)^(z - 0.5) * e^-(z + g - 0.5) * lanczos(z){noformat}
I would also estimate that the current implementation is faster based on the 
number of floating point operations. This is not confirmed.

Options would be:
 # Expose the new version in a new API and optionally deprecate 
LanczosApproximation
 # Keep the LanczosApproximation and leave the Boost implementation package 
private

h4. Usage of LanczosApproximation

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-15 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443765#comment-17443765
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Using Dfp to increase precision

The Dfp class provided arbitrary precision floating-point arithmetic (DFP = 
Decimal Floating Point library for Java). I have testing using this class with 
the version in Commons Math 3 to perform the computation of the regularised 
gamma prefix:
{noformat}
x^a e^-x / gamma(a){noformat}
I used 64 digits of precision which is enough for the current test data.

Results:
||Test||Standard|| ||Dfp|| ||Difference|| ||
|| ||max||RMS||max||RMS||max||RMS||
|GAMMA_P_DERIV_INT|3.16382|0.974935|3.16382|0.974935|0|0|
|GAMMA_P_DERIV_SMALL|3.12638|0.901515|3.12638|0.901515|0|0|
|GAMMA_P_DERIV_MED|4.39499|1.00498|4.39499|1.00498|0|0|
|GAMMA_P_DERIV_BIG|538.126|52.8391|2.43417|0.796512|-535.69183|-52.042588|
|IGAMMA_P_INT|3.33453|0.837093|3.33453|0.837093|0|0|
|IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|0|0|
|IGAMMA_P_MED|4.31041|0.743446|4.31041|0.743446|0|0|
|IGAMMA_P_BIG|412.953|31.3717|3.92196|0.786162|-409.03104|-30.585538|
|IGAMMA_P_EXTRA|0.502906|0.12985|0.502906|0.12985|0|0|
|IGAMMA_Q_INT|4.83677|1.13616|4.83677|1.13616|0|0|
|IGAMMA_Q_SMALL|3.74672|0.961876|3.74672|0.961876|0|0|
|IGAMMA_Q_MED|5.35148|0.82354|5.35148|0.82354|0|0|
|IGAMMA_Q_BIG|537.101|42.6655|5.7472|0.917859|-531.3538|-41.747641|
|IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|0|0|
|IGAMMA_LARGE_X_ASYMP_TERM|276.954|103.997|276.954|103.997|0|0|
|IGAMMA_LARGE_X_Q|537.101|120.441|1.0676|0.420528|-536.0334|-120.020472|
|IGAMMA_LARGE_X_P|0.772756|0.229714|0.772756|0.229714|0|0|
|IGAMMA_LARGE_X_Q_ASYM|537.101|170.353|344.395|119.68|-192.706|-50.673|
|IGAMMA_LARGE_X_P_ASYM|341.213|98.6633|337.213|97.9287|-4|-0.7346|
|IGAMMA_LOWER_INT|5.41081|1.07932|5.41081|1.07932|0|0|
|IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474|0|0|
|IGAMMA_LOWER_MED|5.99945|1.15425|5.99945|1.15425|0|0|
|IGAMMA_LOWER_BIG|7.29508|1.1326|7.29508|1.1326|0|0|
|IGAMMA_LOWER_EXTRA|4.45394|1.19886|4.45394|1.19886|0|0|
|IGAMMA_LARGE_X_ASYMP_TERM|276.954|103.997|276.954|103.997|0|0|
|IGAMMA_LARGE_X_Q|537.101|120.441|1.0676|0.420528|-536.0334|-120.020472|
|IGAMMA_LARGE_X_P|0.772756|0.229714|0.772756|0.229714|0|0|
|IGAMMA_LARGE_X_Q_ASYM|537.101|170.353|344.395|119.68|-192.706|-50.673|
|IGAMMA_LARGE_X_P_ASYM|341.213|98.6633|337.213|97.9287|-4|-0.7346|
|IGAMMA_UPPER_INT|5.1089|1.36073|5.1089|1.36073|0|0|
|IGAMMA_UPPER_SMALL|2.2915|0.78016|2.2915|0.78016|0|0|
|IGAMMA_UPPER_MED|8.2182|1.68102|8.2182|1.68102|0|0|
|IGAMMA_UPPER_BIG|7.28961|1.1512|7.28961|1.1512|0|0|
|IGAMMA_UPPER_EXTRA|11.1293|3.59124|11.1293|3.59124|0|0|

Note that this change only affects the regularized incomplete gamma functions P 
and Q. In the case of the standard functions the result is typically 0 or 
infinity and so extended precision makes no difference.

In cases where the error is due to cancellation in the power terms using the 
Dfp class eliminates this error. However there are some cases where the error 
is in the other part of the computation, for example the series summation for 
large x which can be computed via the asymptotic approximation (all cases using 
LARGE_X in the name). These errors could only be eliminated by using extended 
precision for the remaining parts of the computation.

Using Dfp throughout the computation would have a big performance impact and 
require significant code changes. The cases where Dfp can improve the 
computation of the power terms are difficult to predict from the input 
arguments. For simplicity the method should use double precision and accept the 
loss of precision for large arguments.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39)

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-12 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442877#comment-17442877
 ] 

Alex Herbert commented on NUMBERS-174:
--

{quote}To be sure: Do you always copy those explanations over to the source 
files?
{quote}
Yes.

But usually it works the other way as I start with comments in the source 
files, investigate and comment on what has been updated and why, then copy the 
findings to the Jira ticket.

In this case the easy option would be to disable this evaluation method and 
effectively set the computation back to the Boost version from 1_68_0 (pre Dec 
2018). But I wanted to find out if it could be used and when. I think there is 
a definite case for leaving the method in. But it should be used more 
conservatively.

I will raise this finding as an issue with the Boost maintainers.

I think the ported code is ready to merge, following some review. There is the 
outstanding issue of whether to merge with its own continued fraction 
implementation and then update it pending the resolution of NUMBERS-175.

I also investigated using Dfp to compute the power functions in extended 
precision. I can post up results of that too. It works very well to eliminate 
errors. I have no idea on performance implications but I think it would be 
significant. It is probably best to leave it out and think of it as a possible 
future enhancement.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-12 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442831#comment-17442831
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

As usual: Thanks for all this work!
To be sure: Do you always copy those explanations over to the source files?

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-12 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442702#comment-17442702
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Incomplete Gamma evaluation for Large X

I found a bug in the implementation when evaluating the incomplete gamma for 
certain parameters. When x is large then the incomplete gamma function has an 
asymptotic approximation:

[Digital Library of Mathematical Functions: Incomplete Gamma: 
8.11|https://dlmf.nist.gov/8.11#E2]

This function is evaluated when:
{noformat}
(x > 1000) && ((a < x) || (Math.abs(a - 50) / x < 1)){noformat}
The result is a summation of terms:
{noformat}
t_{n+1} = t_n * (a - n - 1) / z   ;t_0 = 1{noformat}
In theory this works if a < z or after a few iterations the term is less than 
1. Eventually the term will reduce to zero.

But if a ~ z then the term starts very close to 1. It takes a long time to 
reduce to zero especially if the values are big, e.g. this will not converge 
within a million iterations: a = 1e30, z = nextUp(1e30).

The closer a is in magnitude to z then the longer this series takes to 
converge. The longer it takes to converge the worse the result due to the 
limited precision of the terms computed via recursion and accumulated rounding 
error.

This method was added to address the issue of computing incorrect values when 
an extremely large x was used (see [Boost Math issue 
168|https://github.com/boostorg/math/issues/168]), e.g.
{noformat}
gamma_p(22.5, 1e300){noformat}
For this case the first term is zero and the series would compute the result as 
1. However for this case the routine detects that the prefix for the 
regularised gamma function is 0 and does not even evaluate the series anyway.

The routine to compute the series is only used when the prefix for the 
regularised gamma function is non-zero. This requires terms a and z to be 
similar magnitudes. However there already exists a function to compute the 
result when a and z are similar magnitudes. So removal of the asymptotic 
evaluation method improves the results on the current test suite.

However there are situations where it is an advantage to use this method as it 
is faster than the alternative. So I have looked into how to preserve the 
method.

In a simplification the term n can be approximated as:
{noformat}
term_n = (a/z)^n{noformat}
This holds if a is extremely large as the subtraction of values from a will not 
change its magnitude very much.

Thus the number of iterations is approximately:
{noformat}
n = log(term_n) / log(a/z){noformat}
Since the series starts at 1 and increases then it will definitely stop when 
term_n is ulp(1.0). Setting term_n to ulp(1.0) = 2^-52 we get:
||a / z||a / z||Iterations||
|1 - 2^-1|0.5|52|
|1 - 2^-2|0.75|125|
|1 - 2^-3|0.875|270|
|1 - 2^-4|0.9375|558|
|1 - 2^-5|0.96875|1135|
|1 - 2^-6|0.984375|2289|
|1 - 2^-53|0.|3.25e17|

So it is not advised to use this method using the condition a < z without 
further checks.

I have set the condition to the far more conservative:
{noformat}
(x > 1000) && (a < x * 0.75){noformat}
So for large values the series will compute at most approximately 125 
iterations.

Given the other evaluation methods that use a series have maximum iterations in 
the double digits then this seems an appropriate limit.

When using this updated threshold the method is executed in the current test 
suite with the following iterations for:
{noformat}
664.0791015625  1328.158203125  49
1169.2916259765625  2338.583251953125   50
2057.796630859375   4115.59326171875    51{noformat}
These values for a / x ~ 0.5 match with the approximation for the number of 
iterations.

For these cases the final result is the same or better than not using the 
asymptotic approximation.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  com

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-12 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442634#comment-17442634
 ] 

Alex Herbert commented on NUMBERS-174:
--

{quote}Wouldn't it be safer/cleaner to deprecate the {{ContinuedFraction}} class
{quote}
Ticket created NUMBERS-175.

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-10 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441990#comment-17441990
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

bq. [...] possible API additions for ContinuedFraction

Wouldn't it be safer/cleaner to deprecate the {{ContinuedFraction}} class, and 
define
* RegularContinuedFraction
* GeneralizedContinuedFraction

based on the nomenclature used on the [Wikipedia article| 
https://en.wikipedia.org/wiki/Continued_fraction] ?


> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-10 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441736#comment-17441736
 ] 

Alex Herbert commented on NUMBERS-174:
--

The Boost [continued 
fraction|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/internals/cf.html]
 documentation has some more information on the reason for two variations of 
the ContinuedFraction.
{noformat}
A:

a1  
 --
 b1 + a2   
  -
  b2 +a3
   
   b3 + ...

B:

b0 +a1  
 --
 b1 + a2   
  -
  b2 +a3
   
   b3 + ...

This latter form is somewhat more natural in that it corresponds with the usual 
definition of a continued fraction, but note that the first a value returned by 
the generator is discarded. Further, often the first a and b values in a 
continued fraction have different defining equations to the remaining terms, 
which may make the "_a" suffixed form more appropriate.
{noformat}
There are some examples given which all use continued fraction B with the 
exception of the upper incomplete gamma function Q.
{noformat}
In this case the first couple of terms are different from the rest, so our 
fraction will start with the first "regular" a term.
{noformat}
So the use of the two variations allows for special cases where computation of 
the leading terms is not part of a regular series.

Note that the Boost code can also evaluate the function for a simple continued 
fraction where the a's are all 1:
{noformat}
A:

 1  
b0 + --
 b1 +   1   
  -
  b2 +1
   
   b3 + ...
{noformat}

This is done using some meta programming to allow the generator to return 
either a single or pair of values and then extracting a and b from the result. 
If the result is single then a is set to 1. A similar addition could be done 
for ContinuedFraction. I will raise a new ticket to discuss possible API 
additions for ContinuedFraction.


> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-09 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441457#comment-17441457
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

bq. Having used both I prefer the generator implementation. But it adds another 
implementation to maintain.

Hence, I'd favour option 4 (and deprecate the current implementation).

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-09 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441449#comment-17441449
 ] 

Alex Herbert commented on NUMBERS-174:
--

{quote}Do I understand correctly that either public or internal (in the "gamma" 
module) there will be two implementations of the continued fraction 
functionality in [Numbers]?
{quote}
At the moment yes. I agree that this is not good. I originally wrote the new 
version so I could be sure it was working as per the Boost version. Now I have 
had time to revisit this to see if ContinuedFraction could be used.

My first thought was if there is no real benefit in accuracy then pick the one 
that is faster. It helps that it matches the reference source code to allow 
easier maintenance in the future when referring to the original code. The other 
side is that if there is no real benefit in speed then reuse what we have and 
so any improvements will naturally be collected by dependent code.

Note that the Boost code has two continued fraction implementations. One has a 
leading term b0. This matches the ContinuedFraction class. The other has no 
leading b0 term. This is the method used for gamma Q. The methods are 
continued_fraction_a and continued_fraction_b.

Evidently the type of fraction to use will depend on the terms and the 
magnitude of b0 relative to the remaining terms that must converge.

Rather than forcing the ContinuedFraction to perform this evaluation by 
generating zero for term b0 it was simpler to create an implementation that 
closely matched the Boost code. However now that I had time to look at the 
effect of using ContinuedFraction in several variations it is clear that 
although the current method of computation is not as accurate it can be used in 
a similar method to match the accuracy.

In the interest of reducing code maintenance I can revise the code to use 
ContinuedFraction with notes on why the computation is done using that method.

The performance improvement of the Boost method is not very large and it may 
even be possible to remove any benefit at all with a similar method in 
ContinuedFraction that does not use the term b0.

Current ContinuedFraction has the API:
{code:java}
/*              a1
 * b0 + --
 *      b1 +      a2
 *           -
 *           b2 +    a3
 *                
 *                b3 + ...
 */
public abstract class ContinuedFraction {
    protected abstract double getA(int n, double x);
    protected abstract double getB(int n, double x);
    public double evaluate(double x, double epsilon);

// ... Variations with iteration limits
}{code}
The implementing class only has to provide method to get terms a and b, 
retrieved by index. It does allow the fraction to be reused to compute for a 
different argument. Or the same argument again. But it does not lend itself to 
using recursion to generate terms from the previous terms. I do not know why 
this API was chosen. It existed in Commons Math 3 so was created a long time 
ago.

The Boost version is done via a generator that creates a and b together to 
evaluate with or without the leading term. This is an API similar to:
{code:java}
/*                a1
 *  [b0 +] --
 *     b1 +      a2
 *              -
 *          b2 +    a3
 *               
 *                   b3 + ...
 */
public class ContinuedFraction {
// Uses all terms provided by the generator.
// generator assumed to start at a1.
    public static double evaluateA(Supplier cf, double epsilon);
// Discards the first term a0 provided by the generator and uses b0
public static double evaluateB(Supplier cf, double epsilon);

// ... Variations with iteration limits
}{code}
Each call to the supplier gets the next terms. The supplier cannot be reused. 
The evaluate method does not know anything about how the terms are created, 
only the convergence epsilon. The generator can be created using an anonymous 
class to hold the current terms and generate the next recursively.

Having used both I prefer the generator implementation. But it adds another 
implementation to maintain.

As suggested above the ContinuedFraction class can be extended to offer the 
alternative computation without the leading term b0. I've quickly done this and 
added it to the benchmark with two variations that pass the current test suite 
with the same accuracy. The results are:
||implementation||Method||Score||Median||
|Numbers 1.0|evaluate|80835.4|80474.2|
|Numbers Modified|evaluate|79672.2|79622.7|
|Numbers Extended A|evaluate|75785.0|75748.5|
|Numbers Extended A1|evaluate|73590.3|73534.9|
|Boost|evaluate|70280.4|70208.7|

Here the modifications to the ContinuedFraction class make it faster than the 
modifications to the usage of the existing ContinuedFraction. It is not quite 
as fast as the Boost method but closer. Thus m

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-09 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441355#comment-17441355
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

{quote}
I recommend the code should not be updated to use ContinuedFraction. It should 
use the ported Boost version of the modified continued fraction.

The alternative is to provide a second continued fraction implementation in 
numbers that does not use the leading term b0.
{quote}

Do I understand correctly that either public or internal (in the "gamma" 
module) there will be two implementations of the continued fraction 
functionality in [Numbers]?

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-09 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441348#comment-17441348
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. ContinuedFraction

The regularized incomplete Gamma Q evaluation uses a continued fraction. The 
Commons Numbers RegularizedGamma.Q method uses ContinuedFraction. The code 
implements a variation of this sum:
{noformat}
             1
--
b0 + a1 / (b1 +     a2       )
                -
                b2 +    a3
                     
                     b3 + ...

b0 = z - a + 1
a1 = a - 1
{noformat}
It can be done several ways:
{noformat}
Method 1: Commons Numbers RegularizedGamma.Q
getA(n, x) : n * (a - n)
getB(n, x) : ((2 * n) + 1) - a + x
result = 1 / cf.evaluate(z)

Method 2: Modified use of ContinuedFraction
getA(n, x) : n * (a - n)
getB(n, x) : n == 0 ? 0 : ((2 * n) + 1) - a + x
result = 1 / (z - a + 1 + cf.evaluate(z))

Method 3: Updated use of ContinuedFraction / Boost continued fraction
getA(n, x) : (n+1) * (a - n - 1)
getB(n, x) : ((2 * (n+1)) + 1) - a + x
result = 1 / (z - a + 1 + (a - 1) / cf.evaluate(z))
{noformat}
h3. Accuracy

The error for each method:
||Data||Numbers 1.0|| ||Modified|| ||Boost|| ||RMS Diff to Boost|| ||
|| ||Max||RMS||Max||RMS||Max||RMS||Numbers 1.0||Modified||
|IGAMMA_P_INT|3.33453|0.837093|3.33453|0.837093|3.33453|0.837093|0|0|
|IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|2.51022|0.707713|0|0|
|IGAMMA_P_MED|4.31041|0.770837|4.31041|0.749316|4.31041|0.743446|0.027391|0.00587|
|IGAMMA_P_BIG|412.953|48.2627|412.953|48.2623|412.953|48.2623|0.0004|0|
|IGAMMA_P_EXTRA|0.502906|0.12985|0.502906|0.12985|0.502906|0.12985|0|0|
|IGAMMA_Q_INT|5.46066|1.24445|4.83677|1.13616|4.83677|1.13616|0.10829|0|
|IGAMMA_Q_SMALL|5.08353|1.18629|3.74672|0.961876|3.74672|0.961876|0.224414|0|
|IGAMMA_Q_MED|6.81523|1.2909|5.35148|0.830843|5.35148|0.82354|0.46736|0.007303|
|IGAMMA_Q_BIG|536.101|60.993|536.101|60.9895|536.101|60.9895|0.0035|0|
|IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|54.6793|15.1705|0|0|
|IGAMMA_LOWER_INT|5.41081|1.12354|5.41081|1.08145|5.41081|1.07932|0.04422|0.00213|
|IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474|1.38617|0.472474|0|0|
|IGAMMA_LOWER_MED|6.432|1.27844|5.69098|1.11614|5.99945|1.15425|0.12419|-0.03811|
|IGAMMA_LOWER_BIG|7.29508|1.15197|7.29508|1.13324|7.29508|1.1326|0.01937|0.00064|
|IGAMMA_LOWER_EXTRA|4.45394|1.19886|4.45394|1.19886|4.45394|1.19886|0|0|
|IGAMMA_UPPER_INT|6.29133|1.55386|5.1089|1.34698|5.1089|1.36073|0.19313|-0.01375|
|IGAMMA_UPPER_SMALL|4.90231|0.937614|2.2915|0.78016|2.2915|0.78016|0.157454|0|
|IGAMMA_UPPER_MED|9.92377|2.16165|8.2182|1.65163|8.2182|1.68102|0.48063|-0.02939|
|IGAMMA_UPPER_BIG|14.4738|1.58529|7.28961|1.07161|7.28961|1.1512|0.43409|-0.07959|
|IGAMMA_UPPER_EXTRA|11.1293|3.59124|11.1293|3.59124|11.1293|3.59124|0|0|

The original method in numbers is worse. The max and RMS error are higher than 
the Boost continued fraction method.

An updated version using the ContinuedFraction implementation is sometimes 
better, sometimes worse than the Boost continued fraction. Overall it is 
marginally better but the differences are very small (these are measured in 
ULPs).

Note: I implemented Method 3 using ContinuedFraction instead of the code ported 
from Boost. It had fractionally worse error for all test data. 

The continued fraction implementations are both using the same Lentz algorithm 
with some differences in the Boost method:
 # The tiny value is much smaller: 3.5601181736115222e-307 vs 1e-50. In 
practice this branch is never used on the entire suite of test data even using 
1e-50 as the threshold.
 # There is no computation of the next value and checking it is not finite. The 
next value is computed without checks. Thus the ContinuedFraction class is more 
robust. However in this case the fraction is only evaluated when convergence is 
expected.
 # The Boost version never uses a max iterations and eps is set to the machine 
epsilon. The fraction should be evaluated when convergence is expected. The 
maximum iterations found in the test data was 51. The Boost code never checks 
the number of iterations.
 # The Boost termination is based on 'continue if (Math.abs(deltaN - 1) > 
epsilon' rather than 'stop if Math.abs(deltaN - 1) < epsilon'. Updating the 
Boost code to use 'continue if (Math.abs(deltaN - 1) *>=* epsilon' results in a 
maximum iterations found in the test data of 60 with +/- changes to the RMS 
error in the 3rd significant figure. So slower but no more accurate for the 
final result.

h3. Speed

I performed a JMH speed test of the methods using all cases from the test data 
that use the continued fraction evaluation. This has a mixture of data 
requiring max iterations from 5 to 62 for the modified ContinuedFraction method 
2. 
||implementation||Method||Score||Median||
|Numbers 1.0|evaluate|81397.7|80773.

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-09 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441330#comment-17441330
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Improving the Regularised Gamma Function

The majority of error in this function is in computation of the prefix term:
{noformat}
(z^a) (e^-z) / tgamma(a){noformat}
Notes:
 # The tgamma(a) function is infinite above a ~ 171.
 # Computing tgamma(a) requires use of a power function and exponential and 
evaluation of the Lanczos approximation

To reduce error in computing two power terms and two exponential terms the 
Boost code uses an optimisation that combines the terms and calls pow and exp 
only once. See [Boost igamma Equation 
15|https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html].

This works well if evaluated using long double. When evaluated using double 
then there is some loss of precision with all the terms used in the 
rearrangement.

I updated the code to detect when a <= 171, and when the pow and exp functions 
will not overflow/underflow. If these conditions are met then the direct 
evaluation can be performed. Assuming pow and exp are within 1 ULP then the 
majority of error is from tgamma(a). I added more data with gamma values from 
150 to 171. Tests show that this function is accurate to around 4 ulp until the 
result overflows to infinity. So direct computation is valid until overflow 
conditions are reached.

Here are the results for the previous code and the new version which uses the 
direct computation of the result when possible.
||Data||Previous|| ||New|| || || ||
|| ||Max||RMS||Max||RMS||Max Diff||RMS Diff||
|GAMMA_P_DERIV_INT|22.4857|6.10843|3.16382|0.974935|-19.32188|-5.133495|
|GAMMA_P_DERIV_SMALL|3.12638|0.901515|3.12638|0.901515|0|0|
|GAMMA_P_DERIV_MED|57.2834|14.5155|4.39499|1.00498|-52.88841|-13.51052|
|GAMMA_P_DERIV_BIG|538.126|52.8767|538.126|52.8391|0|-0.0376|
|TGAMMAO_VERY_NEAR_0|3.16585|0.651873|3.16585|0.651873|0|0|
|IGAMMA_P_INT|19.7402|4.41416|3.33453|0.837093|-16.40567|-3.577067|
|IGAMMA_P_SMALL|2.51022|0.707713|2.51022|0.707713|0|0|
|IGAMMA_P_MED|57.4788|10.326|4.31041|0.743446|-53.16839|-9.582554|
|IGAMMA_P_BIG|412.953|48.3047|412.953|48.2623|0|-0.0424|
|IGAMMA_P_EXTRA|3.44946|1.00459|0.502906|0.12985|-2.946554|-0.87474|
|IGAMMA_Q_INT|13.4607|2.1156|4.83677|1.13616|-8.62393|-0.97944|
|IGAMMA_Q_SMALL|3.74672|0.961876|3.74672|0.961876|0|0|
|IGAMMA_Q_MED|41.1248|5.71137|5.35148|0.82354|-35.77332|-4.88783|
|IGAMMA_Q_BIG|536.101|60.9956|536.101|60.9895|0|-0.0061|
|IGAMMA_Q_EXTRA|54.6793|15.1705|54.6793|15.1705|0|0|

Use of the direct computation reduces error. It will require an additional 
evaluation of pow and exp. But the alternative computation requires a sqrt and 
lots of branch checks to determine overflow conditions.

 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-08 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440819#comment-17440819
 ] 

Alex Herbert commented on NUMBERS-174:
--

{quote}Could [Numbers] provide an equivalent to Boost's "long double" 
computations?
{quote}
I do not know without trying. The exp function provides a 62-bit number. This 
is used by the pow function. It computes the power by using:
{noformat}
a^x = exp(x  ln a){noformat}
The log function also provides extended precision output for use by the exp 
function. So the log, pow and exp functions from AccurateMath would have to be 
updated to allow the extended precision output to be obtained. The result can 
then be computed using the 62-bit numbers.

BigDecimal cannot be used. It does not have a non-integer exponent power 
function; exp or log. Addition of these elementary functions to BigDecimal 
would be nice.

Dfp may be possible since it provides extended precision for the required 
elementary functions. Thanks for highlighting this. Use of that class would be 
worth investigating.

Currently I am focussed on obtaining what is possible with standard double 
precision. I believe I have found a fix that computes to better accuracy for 
the 'int' and 'medium' data for the incomplete gamma functions. It is the 'big' 
data that cannot be computed without loss of precision. In this case it is more 
about the function gracefully failing to compute rather than catastrophically 
failing to compute. Better precision would be a work for the future. Time would 
be better invested initially by looking into porting the Beta function which 
also has some evaluation issues (see STATISTICS-25 and NUMBERS-169).

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-08 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440635#comment-17440635
 ] 

Gilles Sadowski commented on NUMBERS-174:
-

It looks like major improvements are underway. :)
{quote}Note that Commons Math 4 AccurateMath can compute these functions to 62 
bit precision using a split double number. This is used internally for accurate 
function rounding. However the API is not exposed and the exp and pow results 
are limited to the 53-bit precision of a double.
{quote}
Could [Numbers] provide an equivalent to Boost's "long double" computations?
Could the 
[{{Dfp}}|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=commons-math-legacy-core/src/main/java/org/apache/commons/math4/legacy/core/dfp/Dfp.java;h=02f7df81e38d533d893e96597a9dbbf2cf7f390c;hb=HEAD]
 and [related 
classes|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=commons-math-legacy-core/src/main/java/org/apache/commons/math4/legacy/core/dfp;h=c548c79c6e29a966e147994f6ad1857d71340285;hb=HEAD]
 be of use (and ported/refactored into [Numbers])?
Alternatively, should some of the computations use {{BigDecimal}} internally in 
order to ensure accuracy equivalent to Boost's?  Can this be done without "too 
much" loss in performance (e.g. in the domains where inaccuracies would show up 
in the {{double}} precision result)?

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-07 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440092#comment-17440092
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Gamma P Derivative

This function is not in the gamma package. It is used in statistics for the 
gamma distribution.

This function is defined as:
{noformat}
f(a, z) = z^a * exp(-z) / Gamma(a) / a
= z^(a - 1) * exp(-z) / Gamma(a){noformat}
It can be implemented with logs:
{noformat}
f(a, z) = exp(a * log(z) - z - logGamma(a) - log(z)){noformat}
The code in Commons Statistics attempts to compute this with high accuracy but 
the choice of function can be incorrect (see STATISTICS-39).

I created derivative data using the same a and z values in the Boost datasets 
for the incomplete gamma functions. The result was computed using maxima to 64 
digits. The simple implementation using logs was tested against the function 
from Commons Stats and the new Boost implementation.
||Data||Log|| ||Stats|| ||Boost|| ||
|| ||Max||RMS||Max||RMS||Max||RMS||
|GAMMA_P_DERIV_INT|221.051|44.0093|88.5116|8.13219|22.4857|6.10843|
|GAMMA_P_DERIV_SMALL|104.312|20.5646|452.647|67.9785|3.12638|0.901515|
|GAMMA_P_DERIV_MED|476.683|128.642|66.1587|3.14963|57.2834|14.5155|
|GAMMA_P_DERIV_BIG|1.01E+07|1.02E+06|555.477|62.9605|538.126|52.8767|
h3. Int Data

The use of logs is worse than the stats or Boost method. The Boost method has 
much lower max error than the stats method but the RMS is similar.
h3. Medium Data

The use of logs is worse than the stats or Boost method. The Boost method has 
lower max error than the stats method but the RMS is worse for Boost. This can 
be traced to the following errors above 50 ulp:
{noformat}
stats:
GAMMA_P_DERIV_MED a=0.9759566783905029, z=0.009759566746652126: expected 
<1.091103827394005722772134927295910273305494319535696726782190113> != actual 
<1.091103827393991> (ulps=-66.15874662180883)

boost:
GAMMA_P_DERIV_MED a=79.829345703125, z=0.7982934713363647: expected 
<2.057948567669980623962755091227670264152698891095754996007072457E-125> != 
actual <2.0579485676699656E-125> (ulps=-57.28338674936721)
GAMMA_P_DERIV_MED a=84.98836517333984, z=0.8498836755752563: expected 
<1.584723140811257432756401992975844384590043887470012851849659925E-133> != 
actual <1.5847231408112464E-133> (ulps=-56.62359924370867)
{noformat}
So the Boost method has two occurrences where it fails to compute as 
accurately. It has a different method when z < 1 which computes:
{noformat}
Math.pow(z, a) * Math.exp(-z) / tgamma(a) / a
{noformat}
If changed to:
{noformat}
Math.pow(z, a - 1) * Math.exp(-z) / tgamma(a)
{noformat}
The errors above 50 ulp are eliminated:
{noformat}
GAMMA_P_DERIV_MED   max45.9872   RMS11.8938
{noformat}
This fix does not work for all parameters. The method should not be changed 
without further investigation of appropriate limits to switch computations.
h3. Small Data

Direct use of logs performs better on this data than the method from 
statistics. This is data that demonstrates the incorrect choice of evaluation 
method as discussed in STATISTICS-39. The Boost method has far superior 
performance on this data as it correctly chooses to compute the direct formula 
(Math.pow(z, a) * Math.exp(-z) / tgamma(a) / a) rather than using logs or 
alternative computations with the Lanczos approximation.
h3. Big Data

Use of logs here is incorrect. Both the stats and Boost method are using an 
appropriate rearrangement of the computation to avoid cancellation errors. The 
stats method is based on the documentation for the Boost function. The 
difference may be due to the use of the computation log1p(x) - x which has a 
special method in Boost (log1pmx(x)) to compute this without loss of accuracy 
when the argument x is close to 0.
h2. Overall

The method using logs is outperformed by other methods. This is the subject of 
MATH-753. Using logs can result in cancellation for some parameters where 
alternative computation is superior.

The method from statistics is successful in computing the result for large z. 
It fails when z is small. The Boost method is robust across all data. It would 
be recommended to expose it through a public API.

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-07 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440014#comment-17440014
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Incomplete Gamma(a, z)
||Data||Modified|| ||Boost|| ||
|| ||Max||RMS||Max||RMS||
|IGAMMA_LOWER_INT|3.7574|0.95297|5.41081|1.07932|
|IGAMMA_LOWER_SMALL|1.38617|0.472474|1.38617|0.472474|
|IGAMMA_LOWER_MED|8.18929|1.36357|5.99945|1.15425|
|IGAMMA_LOWER_BIG|7.29508|1.17811|7.29508|1.1326|
|IGAMMA_UPPER_INT|6.92561|1.50143|5.1089|1.36073|
|IGAMMA_UPPER_SMALL|2.2915|0.78016|2.2915|0.78016|
|IGAMMA_UPPER_MED|9.0427|1.70797|8.2182|1.68102|
|IGAMMA_UPPER_BIG|6.49026|1.06544|7.28961|1.1512|
h3. Notes

This function is not implemented in the current numbers gamma library. It can 
be done by multiplying the value P or Q by gamma(a). When a > 171 or close to 
zero then the result is likely to be infinite.

For reference I updated the Boost function to call the current numbers 
implementations for gamma and log gamma.
h3. Performance

The functions compute accurately for each dataset. When the current 
implementation for gamma and log gamma is used (Modified) then the performance 
is very similar. There are a few cases where the Boost function is worse using 
its own gamma functions.

The worst case for IGAMMA_LOWER_INT is:
{noformat}
tgammaLower(20.5, 20.5){noformat}
The result is computed by multiplying a series term by the value of 
gamma(20.5). The result of this function is compared to the result from maxima 
to 64 digits:
{noformat}
gamma(20.5)

maxima
5.406242982335075044736873647808221412273166395781714789601948262e17

BOOST
5.4062429823350765e17 (2 ulp)

current
5.4062429823350726E17 (-4 ulp){noformat}
In this case the Boost function is more accurate. The error is in the term it 
is multiplied by. This term is too big. Thus when multiplied by the commons 
numbers gamma result the two errors cancel and the final result is closer to 
the expected answer.

The worst case for IGAMMA_UPPER_BIG is:
{noformat}
tgamma(95.06404113769531, 95.06404113769531);
{noformat}
Compute p via a lower gamma series multiplied by incompleteGammaPrefix: 
5.6549889907651e146.

In this case the series is evaluated using an optimisation to reduce the number 
of iterations.
{noformat}
initValue = -a * tgamma(a) / fullIgammaPrefix(a, x)
{noformat}
tgamma(a) is used for the initial value. Again the terms can be checked against 
the true answer:
{noformat}
gamma(95.06404113769531)

maxima
1.455104768555175739015723834289112471039232295182085666150559959e146

BOOST
1.455104768555176E146  (2 ulp)

current
1.4551047685551754E146 (-1 ulp)
{noformat}
So this is a case where the current implementation is closer to the true result 
for the gamma(z) function.

Checking cases for the worst errors shows that the final accuracy can be 
dependent on a few ULP differences in evaluation of the gamma function. Given 
that the Boost gamma function is overall marginally more accurate when tested 
against a range of data for z > 20 this is not a difference that is considered 
significant. It would be recommended to keep the Boost gamma implementation.
h2. Overall

This function is accurate when the result is computable (i.e. not infinite). It 
would be recommended to expose it through a public API.

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-07 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440011#comment-17440011
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Regularised Incomplete Gamma(a, z) : P and Q
||Data||Current|| ||Boost|| ||
|| ||Max||RMS||Max||RMS||
|IGAMMA_P_INT|255.083|39.2453|19.7402|4.41416|
|IGAMMA_P_SMALL|32.9236|11.2031|2.51022|0.707713|
|IGAMMA_P_MED|963.153|163.467|57.4788|10.326|
|IGAMMA_P_BIG|1.63E+07|1.14E+06|412.953|48.3047|
|IGAMMA_Q_INT|300.539|48.5842|13.4607|2.1156|
|IGAMMA_Q_SMALL|5.66E+11|7.27E+10|3.74672|0.961876|
|IGAMMA_Q_MED|1071.29|176.451|41.1248|5.71137|
|IGAMMA_Q_BIG|1.63E+07|1.25E+06|536.101|60.9956|
h3. Notes

The regularised incomplete gamma functions have a value between 0 and 1. They 
are the incomplete gamma function divided by gamma(a). When a > 171 then 
gamma(a) is infinite and the normalisation uses log gamma. The functions P and 
Q can be expressed as a series (P) or a continued fraction (Q). There are other 
methods of evaluation too.

The current implementation always the series or continued fraction and uses log 
gamma for normalisation. This may result is inaccuracy if gamma(z) is 
computable.

The Boost implementation has special handling to use different computations for 
different ranges of the arguments. It normalises without logs if possible. It 
avoids the series and continued fractions when the result is close to 0.5 and 
the series are slow to converge. It has special handling for large arguments to 
attempt to compute what is possible given the precision of the floating-point 
type.
h3. Int Data

The current function computes reasonable accuracy for this data. The Boost 
function is an order of magnitude more accurate.
h3. Small Data

The current function computes reasonable accuracy for this data for P. The 
Boost function is an order of magnitude more accurate.

The current function has very large max errors for Q. This is for data 
approaching q=0. The Boost function maintains accuracy for this data.
h3. Medium Data

The current function has moderate accuracy for this data. It has no obvious 
failings with very large max error. The Boost function is again over an order 
of magnitude more accurate.
h3. Big Data

The current function fails dramatically on this data. The Boost function is 
reasonable.

I have tracked the Boost error for this data for the worst case. The function 
is:
{noformat}
gamma_q(a=2057.796630859375, z=4115.59326171875){noformat}
In this case the function must evaluate a power term to be multiplied by a 
series value:
{noformat}
Math.pow(z, a) * Math.exp(-z)
{noformat}
This term is subject to large cancellation errors. Without special handling the 
power function overflows and the exp function underflows. If replaced with the 
exact value computed using extended precision then the result of the gamma_q 
function on these arguments is correct. Thus the default Boost policy to 
evaluate in long double for a double result handles this data. A long double 
typically has a 15 bit exponent for a maximum value of 2{^}+/-16383{^} and a 
60-bit mantissa. This can handle these values for the power and exp term 
without cancellation. When evaluated using 53-bit precision this function is at 
the limit of what can be computed.

The only way to improve accuracy for the Boost implementation in this case is 
an extended precision pow and exp function. Note that Commons Math 4 
AccurateMath can compute these functions to 62 bit precision using a split 
double number. This is used internally for accurate function rounding. However 
the API is not exposed and the exp and pow results are limited to the 53-bit 
precision of a double.
h2. Overall

The Boost function is superior for all data. It is particularly robust to 
extreme inputs. The error for these arguments is due to the limited precision 
of the power and exponential functions.

This would recommend switching to the Boost function.

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideal

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-07 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440007#comment-17440007
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Log Gamma(z)
||Data||Current|| ||Boost|| ||
|| ||Max||RMS||Max||RMS||
|LGAMMA_FACTORIALS|3.32911|0.8279|0.645222|0.0682534|
|LGAMMA_NEAR_0|1.09044|0.452423|1.09044|0.452423|
|LGAMMA_NEAR_1|1.73052|0.554198|1.43922|0.577705|
|LGAMMA_NEAR_2|1.70538|0.681634|0.551573|0.120172|
|LGAMMA_NEAR_M10|5.59972|1.72064|2.89892|0.850624|
|LGAMMA_NEAR_M55|1.74026|0.729566|0.740256|0.352296|
|LGAMMA_M20_0|156.708|13.3007|92.708|7.90596|
|LGAMMA_0_20|2.14863|0.76859|0.844097|0.188636|
|LGAMMA_20_150|2.04684|0.693322|1.28226|0.331251|
|LGAMMA_VERY_NEAR_0|1.63002|0.416674|1.61922|0.271574|
h3. Notes

Negative arguments require special handling. For the current implementation I 
added the functionality from Boost so the function returns a result for 
negative z.

The current function computes using different methods for z in [0, 0.5), [0.5, 
2.5], (2.5, 8], (8, inf). The final range uses the Lanczos approximation.

The Boost function computes using different methods for z in [0, 1.5e-8), z < 
15, z < 100, z >= 100. For z < 15 there is a dedicated function for small z 
which computes accurately at the roots for z=1, z=2. For z < 100 the function 
computes the log of tgamma(z). Above z=100 the Boost function used a different 
set of coefficient for the Lanczos approximation for log gamma. The 
coefficients have been divided by the term exp(g) where g is the Lanczos 
coefficient. This avoids having to subtract log(g) during the computation. 
h2. Overall

The two functions are equivalent near 0. Very near zero the Boost function is 
better. On the other data the Boost function is more accurate with the 
exception of a slightly higher RMS error for the near 1 dataset. The max error 
on this dataset is lower.

This would recommend switching to the Boost function.

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  implement all the suggested optimisations detailed in the most recent
>  Boost documentation. This includes avoiding using the converging
>  series of the gamma function when the convergence is slow or unstable,
>  i.e. addresses the need for a configurable convergence threshold in
>  the Poisson distribution.
> One part of the evaluation of the incomplete gamma functions require
>  accuracy in the leading term:
> {noformat}
> x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a){noformat}
> When x and a are large then using logs to compute this leads to
>  cancellation. Use of logs to compute this term is done in the current
>  implementation in numbers for RegularizedGamma P and Q. It is the
>  source of low precision for the CDF for the Poisson distribution when
>  the mean is large.
> I propose to port the Boost gamma functions to numbers. This will be a
>  process similar to the recent port of the error functions to the same
>  package in numbers. These functions improved both accuracy and
>  speed over the existing implementation. Once the gamma functions are
>  ported a comparison can be made between the existing and new
>  implementations. 
> [1] 
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-07 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439960#comment-17439960
 ] 

Alex Herbert commented on NUMBERS-174:
--

h2. Gamma(z)
||Data||Current|| ||Modified|| ||Boost|| ||
|| ||Max||RMS||Max||RMS||Max||RMS||
|TGAMMA_FACTORIALS|3.89345|1.08579| | |2.2426|0.619793|
|TGAMMA_NEAR_0|1.46384|0.550829| | |3.16585|1.10525|
|TGAMMA_NEAR_1|1.10029|0.573537| | |3.10029|1.00016|
|TGAMMA_NEAR_2|0.985467|0.413246| | |2.73141|1.10202|
|TGAMMA_NEAR_M10|1.56073|0.534165| | |2.37076|1.01606|
|TGAMMA_NEAR_M55|7.08E+06|1.29E+06|2.76601|0.945008|2.2891|1.08438|
|TGAMMA_M20_0|2.7161|0.74157| | |4.35554|1.26587|
|TGAMMA_0_20|1.86447|0.54758| | |2.97733|1.09463|
|TGAMMA_20_150|Infinity|Infinity|5.77643|1.8674|3.35312|0.994272|
|TGAMMA_VERY_NEAR_0|3.30574|0.567205| | |3.30574|0.567205|
h3. Notes

Both implementations switch to using a [Lanczos 
approximation|https://www.boost.org/doc/libs/1_64_0/libs/math/doc/html/math_toolkit/lanczos.html]
 at abs(z) > 20. The Boost version also uses this approximation below 20. 
Negative values are shifted into the positive domain using recursion properties 
of the gamma function. Very close to zero (z < 1.5e-8) there is a switch to 
gamma(z) = 1/z - E where E is Euler's constant. This data is tested using 
TGAMMA_VERY_NEAR_0.
h3. |z| < 20

The current implementation of the gamma function is more accurate than the 
Boost version when |z| < 20. The Boost version uses the Lanczos approximation 
here whereas the current implementation using the NSWC Library of Mathematics 
Subroutines has a dedicated routine suitable for double precision. Note that 
the default policy in Boost is to promote double to long double for 
computations. I verified that if this is performed then the C++ Boost version 
has zero error. Thus the implementation is partly relying on having extended 
precision to act as guard digits during evaluation of the Lanczos approximation 
at small z.
h3. |z| very small

Both implementations are equal when |z| approaches 0.
h3. z > 20

This uses the Lanczos approximation. This value is then multiplied by power 
terms. In the current code this is:
{noformat}
pow(z + g + 0.5, z + 0.5) / exp(z + g + 0.5)

g = Lanczos constant (about 4.74)
{noformat}
These terms can be very large and there is no detection of potential overflow 
here. Thus the current code has infinite error when overflow begins. In the 
modified column I have rerun the code with no values that cause overflow (z > 
141.5). In this case the Boost code is still more accurate.

The Boost code has overflow detection and a modification to compute the power 
terms with a rearrangement. It can compute gamma(z) up to the limit of z at 
around 172.
h3. Negative z

The result for negative z is
{noformat}
result =  -pi / z * sin(pi * z) * gamma(|z|)
{noformat}
The term {{z * sin(pi * z)}} must be evaluated correctly. If not then you get 
very large errors as shown for the M55 case for the current code. In the 
modified column I updated the code to use the Boost sinpx function. Here the 
error reduces to close to the error of the Boost version.
h3. Factorials

The Boost implementation uses precomputed values of the factorial for all 
integers. This gives it an advantage on this data. I did not rerun the test 
using only half-integers to eliminate this advantage.
h2. Overall

The current implementation is very accurate. It is better than the Boost 
implementation at z < 20. At large |z| the Boost implementation is marginally 
better across the range of tested values (20 - 150). It handles z up to the 
limit of the gamma function and has special evaluation of z * sin(pi * z) for 
negative z.

This would recommend switching to a hybrid function combining the current 
implementation for |z| < 20 with the Boost for |z| > 20. 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a thres

[jira] [Commented] (NUMBERS-174) Update Gamma functions using the Boost implementation

2021-11-06 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439909#comment-17439909
 ] 

Alex Herbert commented on NUMBERS-174:
--

I have ported a set of Gamma functions from Boost C++ to Java:
||Function||Description||Present in Commons Numbers gamma package||
|tgamma(z)|'True' gamma function|Y. Gamma.value(z)|
|lgamma(z)|Log of the gamma function|Y. LogGamma.value(z)
 
Support in numbers is only for x > 0. The Boost function adds support for 
negative numbers (where gamma(z) can be positive or negative) by returning the 
log of the absolute value of the gamma function of z: log(abs(gamma(z))). The 
sign of the gamma(z) result is returned as an output.|
|tgamma(a, z)|Incomplete upper gamma function|N|
|tgamma_lower(a, z)|Incomplete lower gamma function|N|
|gammaP|Regularised incomplete lower gamma function|Y. 
RegularizedGamma.P.value(z)|
|gammaQ|Regularised incomplete upper gamma function|Y. 
RegularizedGamma.Q.value(z)|
|gammaPDerivative|Derivative of the incomplete lower gamma function|N. This is 
the probability density function of the Gamma distribution. There is an 
implementation in Commons statistics.|
h2. Test Data

The Boost C++ code has test files which I have used and added to with 
additional data generated using maxima:
h3. Gamma Data
||Data||Source||Description||
|Gamma factorials|Boost|Integers and half-integers in the range 1 to 99.5|
|Gamma near 0|Boost|z in [-0.5, 0.5]|
|Gamma near 1|Boost|z in [0.5, 1.5]|
|Gamma near 2|Boost|z in [1.5, 2.5]|
|Gamma near -10|Boost|z around the negative pole at -10|
|Gamma near -55|Boost|z around the negative pole at -55|
|Gamma 20 - 0|maxima|z in [-20, 0] in intervals of 0.125 excluding integers|
|Gamma 0 - 20|maxima|z in [0, 20] in intervals of 0.125 excluding integers|
|Gamma 20 - 150|maxima|z in [20.75, 150.25] in intervals of 0.5|
|Gamma very near 0|maxima|z in +/-[2^-14, 2^-60] in powers of 2|

Note the gamma evaluation uses a Lanczos approximation when z > 20. Evaluation 
for negative values requires additional computation. Thus extra datasets 
targeting -20 to 0, 0 to 20 and above 20 have been created to test the Boost 
implementation against the current implementation with and without Lanczos 
support.

The limit for gamma(z) is approximately 171. Note that gamma(z) = (n-1)! and 
the maximum representable factorial as a double is 170.
h3. Incomplete Gamma Data
||Data||Source||Description||
|igamma int|Boost|a in [0.5, 37] integer of half-integer; z chosen to test the 
extremes of the p and q result at 0, 0.5, and 1.0|
|igamma small|Boost|a in [0.165e-11, 0.05], z chosen to test the range of the p 
and q result at 0, 0.5, and 1.0|
|igamma medium|Boost|a in [0.97, 99.65], z chosen to test the range of the p 
and q result at 0, 0.5, and 1.0|
|igamma big|Boost|a in [0.15e-5, 1736170], z chosen to test the extremes of the 
p and q result at 0, 0.5, and 1.0. Many expected results in this dataset are 
zero or infinity as a double.|
|gamma p deriv|maxima|a and z in [0.25 + 2^i / 8] with i in [0, 13]; range 
0.375 to 1024.25|
h2. Results Analysis

Results files have approximately 50 to 300 data points. Expected results are 
evaluated to at least 64 significant figures and compared to the actual 
computed values using BigDecimal to obtain fractional ULP differences. Any 
computed result value below 0.5 ulp from the expected result is exact. 
Otherwise the ulp difference is used to compute the maximum error and the root 
mean square error (RMS).

Results for each function will follow in separate posts. 

> Update Gamma functions using the Boost implementation
> -
>
> Key: NUMBERS-174
> URL: https://issues.apache.org/jira/browse/NUMBERS-174
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: gamma
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
>
> The current regularised incomplete gamma functions for P and Q compute using 
> a series representation for all input. This method is not robust to extreme 
> arguments.
> The Gamma functions are used in Commons Statistics in the Gamma and Poisson 
> distributions.
> Large values of the mean in the Poisson distribution have low
>  precision for the CDF. The distribution also has a configurable
>  threshold for convergence of the gamma function evaluation
>  (STATISTICS-38). Ideally this implementation detail should be
>  removed from the API.
> The Gamma distribution has a function switch based on a threshold to
>  compute the PDF. This threshold results in incorrect values for
>  certain parameters (STATISTICS-39).
> The function switch in the Gamma distribution is based on the
>  documentation for the Boost gamma functions [1]. However it does not
>  impl