Re: [Perldl] Requesting criticism: PDL binding to GSL's randist functions

Tom Nishimura Fri, 06 Jul 2012 17:29:24 -0700

David, 
Thank you for your review and response, I've inlined my
thoughts/questions
as well:

> On Tue, Jul 3, 2012 at 9:31 PM, Tom Nishimura <[email protected]> wrote:
> 
>> Hello Piddlers,
>>
>> I'm working on a module called PDL::Probability which hopes to bind GSL's
>> randist functions and provide a clean interface to them.  It is in its
>> very early stages and the viewable at (most of the working has been in
>> the GSL/ subdir):
>>
>> https://github.com/tnishimura/PDL-Probability
>>
> 
> Project layout looks decent. These days I use Module::Build for all of my
> PDL::PP code because I feel that M::B makes it much easier to manage
> everything. I'll be happy to help you switch over to M::B if you would
> like to try that. Handling your C-level tests is a little tricky, though;
> see my comments at the end.

Please see questions about this at the end.

>> Most of the binding (namely, for the univariate distributions) is
>> dynamically generated by GSL/t/gsl_randist.pp using an annotation file
>> that I created:
>>
>>
>> https://github.com/tnishimura/PDL-Probability/blob/master/GSL/share/gsl_randist.yml
>>
> 
> Excellent. This is a great way to handle this sort of thing. Does this
> mean that adding new functions (as GSL's random distributions grow) is
> simply a matter of adding something to the YAML file?

Yes, as long as it's univariate and it follows the same conventions as
the
others.  I haven't done this, but I should go through the GSL changelogs
and
add a 'version' entry to each distribution in the yaml to signify the
lowest
version of gsl which has the distribution, and have gsl_randist.pp
generate
accordingly. 

>> 2. Bind samplers (gsl_ran_*'s) in a more "threadable" way.  The sampler
>> bindings in PDL::GSL::RNG sometimes places parameters in OtherPars,
>> making them not-threadable, like  gsl_ran_tdist()'s second parameter).
>>
> 
> The gsl_randist.pp file uses a great set of machinery to generate
> everything. This is well implemented. It must have been hard work, but it
> looks like it's paid off nicely. I had to create similar machinery for
> PDL::Drawing::Prima, though I wasn't smart enough to YAMLify my data. I
> may consider trying that for my work.

Thanks to the GSL people for making a mostly-consistent interface.

>> 3. Test by comparing to C-calls.  'make test' runs the C-version of the
>> every GSL randist functions at multiple values, and the ouptut of the
>> C-calls are compared to PDL-versions (in GSL/t/compare-to-c.t). Right
>> now, PDL::GSL::RNG only tests whether the functions don't die (in
>> gsl_rng.t).
> 
> Excellent test suite, much more robust than the current one.

Thanks, this part was the hardest.

>> 5. In the future I'd like to create similar sets of functions for
>> distributions not in GSL, like multivariate normal and t-distributions.
>> My original intention was to create the ultimate probability function
>> library (not just those in GSL), thus the module name.  Maybe this is
>> better done in separate modules.
>>
> 
> If you have the knowledge to create such functions (i.e. you know the
> algorithms), I'd be delighted. I don't use probability functions very
> often, but others in my lab do---via Python. Having a full set of
> probability functions for various statistical tests would be enormously
> helpful when I start doing such calculations, or if I ever try to convince
> some of them to try Perl/PDL.

If I'm going to keep the name 'PDL::Probability', I wouldn't want it to
be
missing any "must-haves" that would discourage its use.  Right now the
list
is mv normal and t, and perhaps inverse-gamma, wishart, frechet, and a
few
others. 

However, there are way too many "esoteric" distribution for me attempt
to
implement (and statistically test, which is much harder) myself.  I'm
thinking of creating a sub-namespace like
PDL::Probability::Distributions or
something (suggestions welcome) to place non-GSL distribution functions,
so
that other modules can cleanly add to the namespace.  Haven't thought
about
how to generate the alternative interfaces for such functions, though...
perhap make those dists provide a yaml of their own, or maybe let them
register alternate interfaces themselves.  This is all half-baked. 

> If you want to explore potential architectures while at the same time
> being particularly cautious and backward compatible for early adopters,
> you could name your first try with something like
> PDL::Probability::<Module-name>::RC1. After feedback from the real workd,
> if you find that this module has the interface you like, you simply
> rebrand it as PDL::Probability::<Module-name> and fill the original RC1
> modules with wrappers to the new function names. If you change your mind
> on an API detail, you can release RC2, leaving RC1 on CPAN for backwards
> compatibility. That way people can start using your work and giving
> feedback without having to fear their code breaking when you change
> something.

I've never seen that done in CPAN (though I'm usually not an
early-adopter)
but that doesn seem like a good idea. 

>> Please note that all development is being done with Linux (centos 5 and
>> slackware 13.37), perl 5.12+, PDL 2.4.10+, gsl 1.13+.  I don't think
>> it'll work on windows yet because of hackery in the makefile.
>>
> 
> I've seen much crazier hackery, and I believe that this should work with
> little or not modification on Windows. Matt Trout is both a Perl god and
> an EU::MM advocate, so if you need help, you can hop onto IRC
> (irc.perl.org) and look for #mst. If you still find trouble with
> cross-platform building, you can also switch to Module::Build. You'll need
> to sub-class M::B so that it handles the additional build target; Joel
> Berger will likely be a good resource if you decide to go that route, and
> I will be able to give you the proper incantation for your .pp file.
> (Hint, you'd move it and rename it to lib/PDL/Probability.pm.PL)

I'm leaning towards M::B since I've used it before and I understand perl
better than Makefiles.  I supposed you mean your Module::Build::PDL when
you
say M::B?  The only reason used EU::MM was b/c that's what was doc'd in
the
PDL::PP manual page.  I actually uploaded a 'test' module
PDL::CholeskyPP to
cpan last night to see how the process all works, so I'll probably
convert
that one first.

Do you think I should replace the current mechanism of
C->testvalues->perl-test-script to a single script that uses Inline::C?
Inline is a prereq for PDL anyways.  I don't, however, know enough about
Inline::C to know the disadvantages of using it during the test phase. 

> My only real point of confusion: why do you require 5.10? Why not 5.8.8?

This is temporary -- I started in Perl after 5.10 was released and have
never had to deal with pre-5.10 perl installations, so didn't want to
think
about potential related issues yet.  I'll bring it down to 5.8.

> Looks great!  David

thanks!
Tom

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Requesting criticism: PDL binding to GSL's randist functions

Reply via email to