Re: [statistics] Release 1.0

Gilles Sadowski Thu, 16 Sep 2021 04:05:57 -0700

Hello.

Le jeu. 16 sept. 2021 à 00:46, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> On Wed, 15 Sept 2021 at 17:10, Gilles Sadowski <gillese...@gmail.com> wrote:
>
> > Hello.
> >
> > Le mar. 14 sept. 2021 à 17:13, Alex Herbert <alex.d.herb...@gmail.com> a
> > écrit :
> > >
> > > The statistics component is a candidate for a release.
> > >
> > > The statistics distributions module contains mature functionality
> > > ported from CM. The dependency on [numbers] is now satisfied as that
> > > has had an official release. There is nothing outstanding in the
> > > project Jira. Thus a first release of this component can be performed.
> > >
> > > Items before a release:
> > >
> > > - Remaining Jira tickets should be checked and resolved
> >
> > Can we set to "resolved" the following two:
> >   https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-3
>
>
> Since the JDK Math is supposed to be within 1 ULP of the correct value for
> exp, log1p and pow (the functions used in place of CM AccurateMath) then
> the accuracy issues are due to rounding of intermediates. Looking at the
> tests the tolerances have increased from 1.5 to 2 for the mean ULP for one
> test. This seems like an acceptably low ULP from the exact result. The
> other test the tolerance increased from 220 to 230 ULP for the standard
> deviation of the ULP (which must have a mean below 160). Although high in
> ULPs the increase is less than 5% in the tolerance which appears rather
> arbitrary to begin with (and was probably chosen just to make the test
> pass). I would say this ticket is not a problem.


Thanks; resolved as such.

> >
> >   https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-25
>
>
> Here the implemented fix is computing results almost as well as python and
> R (which compute up to 1e10 to 1e20 degrees of freedom) versus the
> threshold of 2.99e6 used in statistics. I would say it is not resolved but
> is not a blocker.

OK.

> >
> > ?
> >
> > It would be nice to have
> >   https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-9
> > in an "examples" module.
> >
>
> OK. The examples module can be based on code in RNG which uses picocli to
> build programs. I suggest a program that accepts the name of the
> distribution as the command. Each command would have inputs for the
> distribution parameters, the range of points to evaluate and the number of
> steps in the range. It would output a csv format:
>
> x,pdf(x)
>
> For example for the exponential:
>
> java -jar statistics.jar exp --mean 3.45 --min 0 --max 20 --steps 200
> --function cdf
>
> This should not be too complicated.

I was seeing this as a "visual" validation test, so making it as
simple as possible (perhaps even with "interesting" values being
hard-coded).  [Also, I don't think that, versatile or not, such an
application won't be much used beyond generating all the plots
that could help spot a problem (as you did with Ziggurat).]

So the output should perhaps be fixed:

x, pdf, cdf, inv cdf, survival

> If the desire is to generate data for figures with multiple
> parameterisations then the parameters can be comma delimited:
>
> java -jar statistics.jar gamma --shape 1,2,3 --scale 2,2,2 --min 0 --max 20
> --steps 200 --function pdf

Given that useful "min", "max" and "steps" could be dependent on
the parameters, I don't think that this feature would be very useful.

>
> Output would be multiple columns:
>
> x,gamma(shape=1;scale=2),gamma(shape=2;scale=2),gamma(shape=3;scale=2),
>
> An alternative is to have the input points determined by a file:
>
> java -jar statistics.jar gamma --shape 1,2,3 --scale 2,2,2 --points
> input.txt --function pdf

Given the intended usage, I don't see the advantage (and the
drawback would be having to maintain many input list files).

Regards,
Gilles

>
> Functions to support are: pdf, cdf, inverse cdf and survival probability.
>
> Thoughts on this?
>
> Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [statistics] Release 1.0

Reply via email to