Hello. Le jeu. 16 sept. 2021 à 00:46, Alex Herbert <[email protected]> a écrit : > > On Wed, 15 Sept 2021 at 17:10, Gilles Sadowski <[email protected]> wrote: > > > Hello. > > > > Le mar. 14 sept. 2021 à 17:13, Alex Herbert <[email protected]> a > > écrit : > > > > > > The statistics component is a candidate for a release. > > > > > > The statistics distributions module contains mature functionality > > > ported from CM. The dependency on [numbers] is now satisfied as that > > > has had an official release. There is nothing outstanding in the > > > project Jira. Thus a first release of this component can be performed. > > > > > > Items before a release: > > > > > > - Remaining Jira tickets should be checked and resolved > > > > Can we set to "resolved" the following two: > > https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-3 > > > Since the JDK Math is supposed to be within 1 ULP of the correct value for > exp, log1p and pow (the functions used in place of CM AccurateMath) then > the accuracy issues are due to rounding of intermediates. Looking at the > tests the tolerances have increased from 1.5 to 2 for the mean ULP for one > test. This seems like an acceptably low ULP from the exact result. The > other test the tolerance increased from 220 to 230 ULP for the standard > deviation of the ULP (which must have a mean below 160). Although high in > ULPs the increase is less than 5% in the tolerance which appears rather > arbitrary to begin with (and was probably chosen just to make the test > pass). I would say this ticket is not a problem.
Thanks; resolved as such. > > > > https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-25 > > > Here the implemented fix is computing results almost as well as python and > R (which compute up to 1e10 to 1e20 degrees of freedom) versus the > threshold of 2.99e6 used in statistics. I would say it is not resolved but > is not a blocker. OK. > > > > ? > > > > It would be nice to have > > https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-9 > > in an "examples" module. > > > > OK. The examples module can be based on code in RNG which uses picocli to > build programs. I suggest a program that accepts the name of the > distribution as the command. Each command would have inputs for the > distribution parameters, the range of points to evaluate and the number of > steps in the range. It would output a csv format: > > x,pdf(x) > > For example for the exponential: > > java -jar statistics.jar exp --mean 3.45 --min 0 --max 20 --steps 200 > --function cdf > > This should not be too complicated. I was seeing this as a "visual" validation test, so making it as simple as possible (perhaps even with "interesting" values being hard-coded). [Also, I don't think that, versatile or not, such an application won't be much used beyond generating all the plots that could help spot a problem (as you did with Ziggurat).] So the output should perhaps be fixed: x, pdf, cdf, inv cdf, survival > If the desire is to generate data for figures with multiple > parameterisations then the parameters can be comma delimited: > > java -jar statistics.jar gamma --shape 1,2,3 --scale 2,2,2 --min 0 --max 20 > --steps 200 --function pdf Given that useful "min", "max" and "steps" could be dependent on the parameters, I don't think that this feature would be very useful. > > Output would be multiple columns: > > x,gamma(shape=1;scale=2),gamma(shape=2;scale=2),gamma(shape=3;scale=2), > > An alternative is to have the input points determined by a file: > > java -jar statistics.jar gamma --shape 1,2,3 --scale 2,2,2 --points > input.txt --function pdf Given the intended usage, I don't see the advantage (and the drawback would be having to maintain many input list files). Regards, Gilles > > Functions to support are: pdf, cdf, inverse cdf and survival probability. > > Thoughts on this? > > Alex --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
