Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary. Almost by accident, it seems like I've built up the beginnings of a decent statistics library. I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area. The following functionality is currently available:
Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall tau correlation is a very efficient O(N log N) version. Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values. Shannon entropy, mutual information. Kolmogorov-Smirnov tests Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs. Inverse normal distribution, and normally distributed random number generation. A struct to generate all possible permutations of a sequence. On the other hand, I'm a scientist, not a full-time programmer, and although I can write working code, I have no clue what it takes to get code up to the gold standard of "production." Also, this library is very D2-dependent, and I have no interest in back-porting it. Of course if by some chance someone else wanted to back-port it, they'd be more than welcome. Most of the code is covered somehow or another by unit tests, although I cheated a lot by having some unit tests depend on multiple functions. Is there any interest in this from others in the D community? Do other people think that D would benefit from having a decent statistics library? Other comments?