Hi Tom,

It's great to see more complete GSL probability support in PDL. I initially
built the GSL CDF bindings because I needed to look up probabilities for
statistical tests. Then it was "what the heck, I've got this far" and I
built some distribution fitting around that, but it was definitely not as
comprehensive as it could be.

The only hesitation I have about the PDL::Probability namespace with regard
to the GSL bindings is that, there is already the PDL::GSL namespace and it
makes the relationship to GSL clear. That's why the CDF binding is
PDL::GSL::CDF instead of PDL::Stats::CDF. In an ideal world we should have
complete PDL bindings for GSL. Should we keep the PDL::GSL namespace for
all things GSL or are we okay to scatter GSL into different PDL packages?


Best,
Maggie


On Fri, Jul 6, 2012 at 8:10 PM, Tom Nishimura <[email protected]> wrote:

> David,
> Thank you for your review and response, I've inlined my
> thoughts/questions
> as well:
>
> > On Tue, Jul 3, 2012 at 9:31 PM, Tom Nishimura <[email protected]> wrote:
> >
> >> Hello Piddlers,
> >>
> >> I'm working on a module called PDL::Probability which hopes to bind
> GSL's
> >> randist functions and provide a clean interface to them.  It is in its
> >> very early stages and the viewable at (most of the working has been in
> >> the GSL/ subdir):
> >>
> >> https://github.com/tnishimura/PDL-Probability
> >>
> >
> > Project layout looks decent. These days I use Module::Build for all of my
> > PDL::PP code because I feel that M::B makes it much easier to manage
> > everything. I'll be happy to help you switch over to M::B if you would
> > like to try that. Handling your C-level tests is a little tricky, though;
> > see my comments at the end.
>
> Please see questions about this at the end.
>
> >> Most of the binding (namely, for the univariate distributions) is
> >> dynamically generated by GSL/t/gsl_randist.pp using an annotation file
> >> that I created:
> >>
> >>
> >>
> https://github.com/tnishimura/PDL-Probability/blob/master/GSL/share/gsl_randist.yml
> >>
> >
> > Excellent. This is a great way to handle this sort of thing. Does this
> > mean that adding new functions (as GSL's random distributions grow) is
> > simply a matter of adding something to the YAML file?
>
> Yes, as long as it's univariate and it follows the same conventions as
> the
> others.  I haven't done this, but I should go through the GSL changelogs
> and
> add a 'version' entry to each distribution in the yaml to signify the
> lowest
> version of gsl which has the distribution, and have gsl_randist.pp
> generate
> accordingly.
>
> >> 2. Bind samplers (gsl_ran_*'s) in a more "threadable" way.  The sampler
> >> bindings in PDL::GSL::RNG sometimes places parameters in OtherPars,
> >> making them not-threadable, like  gsl_ran_tdist()'s second parameter).
> >>
> >
> > The gsl_randist.pp file uses a great set of machinery to generate
> > everything. This is well implemented. It must have been hard work, but it
> > looks like it's paid off nicely. I had to create similar machinery for
> > PDL::Drawing::Prima, though I wasn't smart enough to YAMLify my data. I
> > may consider trying that for my work.
>
> Thanks to the GSL people for making a mostly-consistent interface.
>
> >> 3. Test by comparing to C-calls.  'make test' runs the C-version of the
> >> every GSL randist functions at multiple values, and the ouptut of the
> >> C-calls are compared to PDL-versions (in GSL/t/compare-to-c.t). Right
> >> now, PDL::GSL::RNG only tests whether the functions don't die (in
> >> gsl_rng.t).
> >
> > Excellent test suite, much more robust than the current one.
>
> Thanks, this part was the hardest.
>
> >> 5. In the future I'd like to create similar sets of functions for
> >> distributions not in GSL, like multivariate normal and t-distributions.
> >> My original intention was to create the ultimate probability function
> >> library (not just those in GSL), thus the module name.  Maybe this is
> >> better done in separate modules.
> >>
> >
> > If you have the knowledge to create such functions (i.e. you know the
> > algorithms), I'd be delighted. I don't use probability functions very
> > often, but others in my lab do---via Python. Having a full set of
> > probability functions for various statistical tests would be enormously
> > helpful when I start doing such calculations, or if I ever try to
> convince
> > some of them to try Perl/PDL.
>
> If I'm going to keep the name 'PDL::Probability', I wouldn't want it to
> be
> missing any "must-haves" that would discourage its use.  Right now the
> list
> is mv normal and t, and perhaps inverse-gamma, wishart, frechet, and a
> few
> others.
>
> However, there are way too many "esoteric" distribution for me attempt
> to
> implement (and statistically test, which is much harder) myself.  I'm
> thinking of creating a sub-namespace like
> PDL::Probability::Distributions or
> something (suggestions welcome) to place non-GSL distribution functions,
> so
> that other modules can cleanly add to the namespace.  Haven't thought
> about
> how to generate the alternative interfaces for such functions, though...
> perhap make those dists provide a yaml of their own, or maybe let them
> register alternate interfaces themselves.  This is all half-baked.
>
> > If you want to explore potential architectures while at the same time
> > being particularly cautious and backward compatible for early adopters,
> > you could name your first try with something like
> > PDL::Probability::<Module-name>::RC1. After feedback from the real workd,
> > if you find that this module has the interface you like, you simply
> > rebrand it as PDL::Probability::<Module-name> and fill the original RC1
> > modules with wrappers to the new function names. If you change your mind
> > on an API detail, you can release RC2, leaving RC1 on CPAN for backwards
> > compatibility. That way people can start using your work and giving
> > feedback without having to fear their code breaking when you change
> > something.
>
> I've never seen that done in CPAN (though I'm usually not an
> early-adopter)
> but that doesn seem like a good idea.
>
> >> Please note that all development is being done with Linux (centos 5 and
> >> slackware 13.37), perl 5.12+, PDL 2.4.10+, gsl 1.13+.  I don't think
> >> it'll work on windows yet because of hackery in the makefile.
> >>
> >
> > I've seen much crazier hackery, and I believe that this should work with
> > little or not modification on Windows. Matt Trout is both a Perl god and
> > an EU::MM advocate, so if you need help, you can hop onto IRC
> > (irc.perl.org) and look for #mst. If you still find trouble with
> > cross-platform building, you can also switch to Module::Build. You'll
> need
> > to sub-class M::B so that it handles the additional build target; Joel
> > Berger will likely be a good resource if you decide to go that route, and
> > I will be able to give you the proper incantation for your .pp file.
> > (Hint, you'd move it and rename it to lib/PDL/Probability.pm.PL)
>
> I'm leaning towards M::B since I've used it before and I understand perl
> better than Makefiles.  I supposed you mean your Module::Build::PDL when
> you
> say M::B?  The only reason used EU::MM was b/c that's what was doc'd in
> the
> PDL::PP manual page.  I actually uploaded a 'test' module
> PDL::CholeskyPP to
> cpan last night to see how the process all works, so I'll probably
> convert
> that one first.
>
> Do you think I should replace the current mechanism of
> C->testvalues->perl-test-script to a single script that uses Inline::C?
> Inline is a prereq for PDL anyways.  I don't, however, know enough about
> Inline::C to know the disadvantages of using it during the test phase.
>
> > My only real point of confusion: why do you require 5.10? Why not 5.8.8?
>
> This is temporary -- I started in Perl after 5.10 was released and have
> never had to deal with pre-5.10 perl installations, so didn't want to
> think
> about potential related issues yet.  I'll bring it down to 5.8.
>
> > Looks great!  David
>
> thanks!
> Tom
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to