Re: [Perldl] Requesting criticism: PDL binding to GSL's randist functions

David Mertens Fri, 06 Jul 2012 08:24:09 -0700

Tom -

I finally had a chance to look over this. My responses are inline below.

On Tue, Jul 3, 2012 at 9:31 PM, Tom Nishimura <[email protected]> wrote:

> Hello Piddlers,
>
> I'm working on a module called PDL::Probability which hopes to bind
> GSL's randist functions and provide a clean interface to them.  It is in
> its very early stages and the viewable at (most of the working has been
> in the GSL/ subdir):
>
> https://github.com/tnishimura/PDL-Probability
>

Project layout looks decent. These days I use Module::Build for all of my
PDL::PP code because I feel that M::B makes it much easier to manage
everything. I'll be happy to help you switch over to M::B if you would like
to try that. Handling your C-level tests is a little tricky, though; see my
comments at the end.

> HTML documentation of the (dynamically generated) GSL-binding portion of
> the module is here:
>
> http://pdlprobability.nfshost.com/pdl-probability-gsl.html

This is pretty good. It's particularly helpful that you explain the naming
convention. I've seen docs that don't do that, so for a module that's "in
very early stages," I am happily surprised. The documentation can only get
better from here, and it's already at a great start. :-)

> Most of the binding (namely, for the univariate distributions) is
> dynamically generated by GSL/t/gsl_randist.pp using an annotation file
> that I created:
>
>
> https://github.com/tnishimura/PDL-Probability/blob/master/GSL/share/gsl_randist.yml
>

Excellent. This is a great way to handle this sort of thing. Does this mean
that adding new functions (as GSL's random distributions grow) is simply a
matter of adding something to the YAML file?

> For reference, this is the documentation to the thing I'm trying to
> bind:
>
>
> http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html
>

Note that in your generated docs (at pdlprobability...), you have a final
period in this link "...Distributions.html." GNU doesn't like that, but it
looks like you've already fixed that:
https://github.com/tnishimura/PDL-Probability/blob/master/GSL/gsl_randist.pp#L995

> Here's what I'm trying to do with this module:
>
> 1. Bind pdf (gsl_ran_*_pdf) functions.  Until now, only cdf's were
> bound, in Maggie X's PDL::Stats module.
>

> 2. Bind samplers (gsl_ran_*'s) in a more "threadable" way.  The sampler
> bindings in PDL::GSL::RNG sometimes places parameters in OtherPars,
> making them not-threadable, like  gsl_ran_tdist()'s second parameter).
>

The gsl_randist.pp file uses a great set of machinery to generate
everything. This is well implemented. It must have been hard work, but it
looks like it's paid off nicely. I had to create similar machinery for
PDL::Drawing::Prima, though I wasn't smart enough to YAMLify my data. I may
consider trying that for my work.

> 3. Test by comparing to C-calls.  'make test' runs the C-version of the
> every GSL randist functions at multiple values, and the ouptut of the
> C-calls are compared to PDL-versions (in GSL/t/compare-to-c.t). Right
> now, PDL::GSL::RNG only tests whether the functions don't die (in
> gsl_rng.t).
>

Excellent test suite, much more robust than the current one.

> 4. Provide option for alternate interfaces.  You can see that I have a
> skeleton for PDL::Probability::RLike, which provides an R-language like
> interface to the functions.  For example, sampling 100 normals can be
> done like rnorm(100, mean => 10, sigma => 123), which is a wrapper for
> gsl_ran_gaussian().  I'm thinking of also providing an Octave-like
> interface and an numpy like interface.
>
> I think alternate interfaces is a good idea b/c every
> language/probability library has a different idea of how to name these
> functions.  John D. Cook, a statistician/programmer/blogger wrote about
> this very recently:
>
> http://www.johndcook.com/blog/2012/07/02/probability-function-names/
>

John Cook raises some interesting points. I really like the emacs
calculator function names. Creating different modules with different naming
schemes is both user-friendly and Perlish. :-)

> 5. In the future I'd like to create similar sets of functions for
> distributions not in GSL, like multivariate normal and t-distributions.
>  My original intention was to create the ultimate probability function
> library (not just those in GSL), thus the module name.  Maybe this is
> better done in separate modules.
>

If you have the knowledge to create such functions (i.e. you know the
algorithms), I'd be delighted. I don't use probability functions very
often, but others in my lab do---via Python. Having a full set of
probability functions for various statistical tests would be enormously
helpful when I start doing such calculations, or if I ever try to convince
some of them to try Perl/PDL.

If you want to explore potential architectures while at the same time being
particularly cautious and backward compatible for early adopters, you could
name your first try with something like
PDL::Probability::<Module-name>::RC1. After feedback from the real workd,
if you find that this module has the interface you like, you simply rebrand
it as PDL::Probability::<Module-name> and fill the original RC1 modules
with wrappers to the new function names. If you change your mind on an API
detail, you can release RC2, leaving RC1 on CPAN for backwards
compatibility. That way people can start using your work and giving
feedback without having to fear their code breaking when you change
something.

> I am requesting any and all criticism.  I haven't settled on anything,
> including the name of the module or the need for modules existance yet.
> As for naming... I was thinking of PDL::GSL::Randist but I don't want to
> intrude into PDL's internal namespace.

Personally, I think this is fine. Do others have any opinions about this?

> And maybe I should send the
> testing portions to Maggie X for integration into PDL::Stats and scrap
> the rest.

Scrap the rest?! Are you kidding? This is some great work and should
definitely be put on CPAN! :-D

> Please note that all development is being done with Linux
> (centos 5 and slackware 13.37), perl 5.12+, PDL 2.4.10+, gsl 1.13+.  I
> don't think it'll work on windows yet because of hackery in the
> makefile.
>

I've seen much crazier hackery, and I believe that this should work with
little or not modification on Windows. Matt Trout is both a Perl god and an
EU::MM advocate, so if you need help, you can hop onto IRC (irc.perl.org)
and look for #mst. If you still find trouble with cross-platform building,
you can also switch to Module::Build. You'll need to sub-class M::B so that
it handles the additional build target; Joel Berger will likely be a good
resource if you decide to go that route, and I will be able to give you the
proper incantation for your .pp file. (Hint, you'd move it and rename it to
lib/PDL/Probability.pm.PL)

My only real point of confusion: why do you require 5.10? Why not 5.8.8?

Looks great!
David

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Requesting criticism: PDL binding to GSL's randist functions

Reply via email to