Tom - I finally had a chance to look over this. My responses are inline below.
On Tue, Jul 3, 2012 at 9:31 PM, Tom Nishimura <[email protected]> wrote: > Hello Piddlers, > > I'm working on a module called PDL::Probability which hopes to bind > GSL's randist functions and provide a clean interface to them. It is in > its very early stages and the viewable at (most of the working has been > in the GSL/ subdir): > > https://github.com/tnishimura/PDL-Probability > Project layout looks decent. These days I use Module::Build for all of my PDL::PP code because I feel that M::B makes it much easier to manage everything. I'll be happy to help you switch over to M::B if you would like to try that. Handling your C-level tests is a little tricky, though; see my comments at the end. > HTML documentation of the (dynamically generated) GSL-binding portion of > the module is here: > > http://pdlprobability.nfshost.com/pdl-probability-gsl.html This is pretty good. It's particularly helpful that you explain the naming convention. I've seen docs that don't do that, so for a module that's "in very early stages," I am happily surprised. The documentation can only get better from here, and it's already at a great start. :-) > Most of the binding (namely, for the univariate distributions) is > dynamically generated by GSL/t/gsl_randist.pp using an annotation file > that I created: > > > https://github.com/tnishimura/PDL-Probability/blob/master/GSL/share/gsl_randist.yml > Excellent. This is a great way to handle this sort of thing. Does this mean that adding new functions (as GSL's random distributions grow) is simply a matter of adding something to the YAML file? > For reference, this is the documentation to the thing I'm trying to > bind: > > > http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html > Note that in your generated docs (at pdlprobability...), you have a final period in this link "...Distributions.html." GNU doesn't like that, but it looks like you've already fixed that: https://github.com/tnishimura/PDL-Probability/blob/master/GSL/gsl_randist.pp#L995 > Here's what I'm trying to do with this module: > > 1. Bind pdf (gsl_ran_*_pdf) functions. Until now, only cdf's were > bound, in Maggie X's PDL::Stats module. > > 2. Bind samplers (gsl_ran_*'s) in a more "threadable" way. The sampler > bindings in PDL::GSL::RNG sometimes places parameters in OtherPars, > making them not-threadable, like gsl_ran_tdist()'s second parameter). > The gsl_randist.pp file uses a great set of machinery to generate everything. This is well implemented. It must have been hard work, but it looks like it's paid off nicely. I had to create similar machinery for PDL::Drawing::Prima, though I wasn't smart enough to YAMLify my data. I may consider trying that for my work. > 3. Test by comparing to C-calls. 'make test' runs the C-version of the > every GSL randist functions at multiple values, and the ouptut of the > C-calls are compared to PDL-versions (in GSL/t/compare-to-c.t). Right > now, PDL::GSL::RNG only tests whether the functions don't die (in > gsl_rng.t). > Excellent test suite, much more robust than the current one. > 4. Provide option for alternate interfaces. You can see that I have a > skeleton for PDL::Probability::RLike, which provides an R-language like > interface to the functions. For example, sampling 100 normals can be > done like rnorm(100, mean => 10, sigma => 123), which is a wrapper for > gsl_ran_gaussian(). I'm thinking of also providing an Octave-like > interface and an numpy like interface. > > I think alternate interfaces is a good idea b/c every > language/probability library has a different idea of how to name these > functions. John D. Cook, a statistician/programmer/blogger wrote about > this very recently: > > http://www.johndcook.com/blog/2012/07/02/probability-function-names/ > John Cook raises some interesting points. I really like the emacs calculator function names. Creating different modules with different naming schemes is both user-friendly and Perlish. :-) > 5. In the future I'd like to create similar sets of functions for > distributions not in GSL, like multivariate normal and t-distributions. > My original intention was to create the ultimate probability function > library (not just those in GSL), thus the module name. Maybe this is > better done in separate modules. > If you have the knowledge to create such functions (i.e. you know the algorithms), I'd be delighted. I don't use probability functions very often, but others in my lab do---via Python. Having a full set of probability functions for various statistical tests would be enormously helpful when I start doing such calculations, or if I ever try to convince some of them to try Perl/PDL. If you want to explore potential architectures while at the same time being particularly cautious and backward compatible for early adopters, you could name your first try with something like PDL::Probability::<Module-name>::RC1. After feedback from the real workd, if you find that this module has the interface you like, you simply rebrand it as PDL::Probability::<Module-name> and fill the original RC1 modules with wrappers to the new function names. If you change your mind on an API detail, you can release RC2, leaving RC1 on CPAN for backwards compatibility. That way people can start using your work and giving feedback without having to fear their code breaking when you change something. > I am requesting any and all criticism. I haven't settled on anything, > including the name of the module or the need for modules existance yet. > As for naming... I was thinking of PDL::GSL::Randist but I don't want to > intrude into PDL's internal namespace. Personally, I think this is fine. Do others have any opinions about this? > And maybe I should send the > testing portions to Maggie X for integration into PDL::Stats and scrap > the rest. Scrap the rest?! Are you kidding? This is some great work and should definitely be put on CPAN! :-D > Please note that all development is being done with Linux > (centos 5 and slackware 13.37), perl 5.12+, PDL 2.4.10+, gsl 1.13+. I > don't think it'll work on windows yet because of hackery in the > makefile. > I've seen much crazier hackery, and I believe that this should work with little or not modification on Windows. Matt Trout is both a Perl god and an EU::MM advocate, so if you need help, you can hop onto IRC (irc.perl.org) and look for #mst. If you still find trouble with cross-platform building, you can also switch to Module::Build. You'll need to sub-class M::B so that it handles the additional build target; Joel Berger will likely be a good resource if you decide to go that route, and I will be able to give you the proper incantation for your .pp file. (Hint, you'd move it and rename it to lib/PDL/Probability.pm.PL) My only real point of confusion: why do you require 5.10? Why not 5.8.8? Looks great! David -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
