Hi Tom, It's great to see more complete GSL probability support in PDL. I initially built the GSL CDF bindings because I needed to look up probabilities for statistical tests. Then it was "what the heck, I've got this far" and I built some distribution fitting around that, but it was definitely not as comprehensive as it could be.
The only hesitation I have about the PDL::Probability namespace with regard to the GSL bindings is that, there is already the PDL::GSL namespace and it makes the relationship to GSL clear. That's why the CDF binding is PDL::GSL::CDF instead of PDL::Stats::CDF. In an ideal world we should have complete PDL bindings for GSL. Should we keep the PDL::GSL namespace for all things GSL or are we okay to scatter GSL into different PDL packages? Best, Maggie On Fri, Jul 6, 2012 at 8:10 PM, Tom Nishimura <[email protected]> wrote: > David, > Thank you for your review and response, I've inlined my > thoughts/questions > as well: > > > On Tue, Jul 3, 2012 at 9:31 PM, Tom Nishimura <[email protected]> wrote: > > > >> Hello Piddlers, > >> > >> I'm working on a module called PDL::Probability which hopes to bind > GSL's > >> randist functions and provide a clean interface to them. It is in its > >> very early stages and the viewable at (most of the working has been in > >> the GSL/ subdir): > >> > >> https://github.com/tnishimura/PDL-Probability > >> > > > > Project layout looks decent. These days I use Module::Build for all of my > > PDL::PP code because I feel that M::B makes it much easier to manage > > everything. I'll be happy to help you switch over to M::B if you would > > like to try that. Handling your C-level tests is a little tricky, though; > > see my comments at the end. > > Please see questions about this at the end. > > >> Most of the binding (namely, for the univariate distributions) is > >> dynamically generated by GSL/t/gsl_randist.pp using an annotation file > >> that I created: > >> > >> > >> > https://github.com/tnishimura/PDL-Probability/blob/master/GSL/share/gsl_randist.yml > >> > > > > Excellent. This is a great way to handle this sort of thing. Does this > > mean that adding new functions (as GSL's random distributions grow) is > > simply a matter of adding something to the YAML file? > > Yes, as long as it's univariate and it follows the same conventions as > the > others. I haven't done this, but I should go through the GSL changelogs > and > add a 'version' entry to each distribution in the yaml to signify the > lowest > version of gsl which has the distribution, and have gsl_randist.pp > generate > accordingly. > > >> 2. Bind samplers (gsl_ran_*'s) in a more "threadable" way. The sampler > >> bindings in PDL::GSL::RNG sometimes places parameters in OtherPars, > >> making them not-threadable, like gsl_ran_tdist()'s second parameter). > >> > > > > The gsl_randist.pp file uses a great set of machinery to generate > > everything. This is well implemented. It must have been hard work, but it > > looks like it's paid off nicely. I had to create similar machinery for > > PDL::Drawing::Prima, though I wasn't smart enough to YAMLify my data. I > > may consider trying that for my work. > > Thanks to the GSL people for making a mostly-consistent interface. > > >> 3. Test by comparing to C-calls. 'make test' runs the C-version of the > >> every GSL randist functions at multiple values, and the ouptut of the > >> C-calls are compared to PDL-versions (in GSL/t/compare-to-c.t). Right > >> now, PDL::GSL::RNG only tests whether the functions don't die (in > >> gsl_rng.t). > > > > Excellent test suite, much more robust than the current one. > > Thanks, this part was the hardest. > > >> 5. In the future I'd like to create similar sets of functions for > >> distributions not in GSL, like multivariate normal and t-distributions. > >> My original intention was to create the ultimate probability function > >> library (not just those in GSL), thus the module name. Maybe this is > >> better done in separate modules. > >> > > > > If you have the knowledge to create such functions (i.e. you know the > > algorithms), I'd be delighted. I don't use probability functions very > > often, but others in my lab do---via Python. Having a full set of > > probability functions for various statistical tests would be enormously > > helpful when I start doing such calculations, or if I ever try to > convince > > some of them to try Perl/PDL. > > If I'm going to keep the name 'PDL::Probability', I wouldn't want it to > be > missing any "must-haves" that would discourage its use. Right now the > list > is mv normal and t, and perhaps inverse-gamma, wishart, frechet, and a > few > others. > > However, there are way too many "esoteric" distribution for me attempt > to > implement (and statistically test, which is much harder) myself. I'm > thinking of creating a sub-namespace like > PDL::Probability::Distributions or > something (suggestions welcome) to place non-GSL distribution functions, > so > that other modules can cleanly add to the namespace. Haven't thought > about > how to generate the alternative interfaces for such functions, though... > perhap make those dists provide a yaml of their own, or maybe let them > register alternate interfaces themselves. This is all half-baked. > > > If you want to explore potential architectures while at the same time > > being particularly cautious and backward compatible for early adopters, > > you could name your first try with something like > > PDL::Probability::<Module-name>::RC1. After feedback from the real workd, > > if you find that this module has the interface you like, you simply > > rebrand it as PDL::Probability::<Module-name> and fill the original RC1 > > modules with wrappers to the new function names. If you change your mind > > on an API detail, you can release RC2, leaving RC1 on CPAN for backwards > > compatibility. That way people can start using your work and giving > > feedback without having to fear their code breaking when you change > > something. > > I've never seen that done in CPAN (though I'm usually not an > early-adopter) > but that doesn seem like a good idea. > > >> Please note that all development is being done with Linux (centos 5 and > >> slackware 13.37), perl 5.12+, PDL 2.4.10+, gsl 1.13+. I don't think > >> it'll work on windows yet because of hackery in the makefile. > >> > > > > I've seen much crazier hackery, and I believe that this should work with > > little or not modification on Windows. Matt Trout is both a Perl god and > > an EU::MM advocate, so if you need help, you can hop onto IRC > > (irc.perl.org) and look for #mst. If you still find trouble with > > cross-platform building, you can also switch to Module::Build. You'll > need > > to sub-class M::B so that it handles the additional build target; Joel > > Berger will likely be a good resource if you decide to go that route, and > > I will be able to give you the proper incantation for your .pp file. > > (Hint, you'd move it and rename it to lib/PDL/Probability.pm.PL) > > I'm leaning towards M::B since I've used it before and I understand perl > better than Makefiles. I supposed you mean your Module::Build::PDL when > you > say M::B? The only reason used EU::MM was b/c that's what was doc'd in > the > PDL::PP manual page. I actually uploaded a 'test' module > PDL::CholeskyPP to > cpan last night to see how the process all works, so I'll probably > convert > that one first. > > Do you think I should replace the current mechanism of > C->testvalues->perl-test-script to a single script that uses Inline::C? > Inline is a prereq for PDL anyways. I don't, however, know enough about > Inline::C to know the disadvantages of using it during the test phase. > > > My only real point of confusion: why do you require 5.10? Why not 5.8.8? > > This is temporary -- I started in Perl after 5.10 was released and have > never had to deal with pre-5.10 perl installations, so didn't want to > think > about potential related issues yet. I'll bring it down to 5.8. > > > Looks great! David > > thanks! > Tom > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
