Re: [Perldl] PDL::Tiny --- what should be in it?

Chris Marshall Mon, 15 Dec 2014 08:37:28 -0800

> On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <zaki.mug...@gmail.com>
wrote:
>
> ...snip...
>
> ## Levels of measurement
>
>   When using R, one of the nice things it does is warn or give
>   an error when you try to do an operation that would be invalid on a
certain
>   type of data. One such type of data is categorical data, which R calls
>   factors and for which I made a subclass of PDL called PDL::Factor. Some
of
>   this behvaviour is inspired by the statistical methodology of levels of
>   measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
believe
>   SAS even explicitly allows assigning levels of measurment to variables.


+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement

> ...snip...
>
>   `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
for
>   categorical data. For logical vectors, it does something else:

I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...

> ...snip...
>
>   Thinking in terms of levels of measurement can help with another
experiment
>   I'm doing which based around tracking the units of measure used for
numerical
>   things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>.
>
>   What I do there is use Moo roles to add a unit attribute to numerical
types
>   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
>   operation by either operator overloading or calling a function such as
>   `sum()`, the unit will be carried along with it and be manipulated
>   appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
>   know that units of measure are messy to implement, but being able to
support
>   auxiliary operations like this will go a long way to making PDL
flexible.

Yes!  The use of method modifiers offer some powerful development
tools to implement various high level features.  I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
>
>   [Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
>
> ## DataShape and Blaze

This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.

>   I think it would be beneficial to look at the work being done by the
Blaze
>   project <http://blaze.pydata.org/> with its DataShape specification
>   <http://datashape.pydata.org/>. The idea behind it is to be able to use
the
>   various array-like APIs without having to worry what is going on in the
>   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL server.
>
> ## Julia
>
>   Julia has been doing some amazing things with how they've grown out
their
>   language. I was looking to see if they have anything similar to the
dataflow
>   in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
>   It may be enlightening to see how they compose this feature onto already
>   existing n-d arrays as opposed to how PDL does it.
>
>   I do not know what tradeoffs that brings, but it is a starting point to
think
>   about. I think similar approaches can be made to support sparse arrays.

Julia views look a lot like what we call slices.

>   In fact, one of Julia's strengths is how they use multimethods to
handle new
>   types with ease. See "The Design Impact of Multiple Dispatch"
>   <http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
>   for examples. [Perl 6 has built-in multimethods]

Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...


> ## MATLAB subclassing
>
> ...snip...
>
> ## GPU and threading
>
>   I think it would be best to offload GPU support to other libraries, so
it
>   would be good to extract what is common between libraries like
>
>   - MAGMA <http://icl.cs.utk.edu/magma/>,
>   - ViennaCL <http://viennacl.sourceforge.net/>,
>   - Blaze-lib  <https://code.google.com/p/blaze-lib/>,
>   - VXL <http://vxl.sourceforge.net/>,
>   - Spark <http://spark.apache.org/>,
>   - Torch <http://torch.ch/>,
>   - Theano <http://www.deeplearning.net/software/theano/>,
>   - Eigen <http://eigen.tuxfamily.org/>, and
>   - Armadillo <http://arma.sourceforge.net/>.
>
>   Eigen is interesting in particular because it has support for storing
in both
>   row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.

We would benefit by supporting the commonalities needed to work
with other GPU computation libraries.  I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level.  We may want our own JIT for performnce.

>   Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
>   It's a standard made for signal processing routines in the embedded DSP
world
>   and comes with "Core" and "Core Lite" profiles which might help decide
what
>   should be included in a smaller subset of PDL.
>
>   Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
>   VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
>   interesting architectures especially for computation. Unfortunately, the
>   first two are C++ based and I don't have experience with combining C++
and XS.

Thanks for all the references and ideas!

> ## Better testing
>
>   PDL should make more guarantees about how types flow through the
system. This
>   might be accomplished by adding assertions in the style of
Design-by-Contract
>   which can act as both a testable spec and documentation. I'm working on
the
>   test suite right now on a branch and I hope to create a
proof-of-concept of
>   this soon.

I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
>
>   I hope that this can help make PDL more consistent and easily testable.
There
>   are still small inconsistencies that shouldn't be there which can be
weeded out
>   with testing. For example, what type is expected for this code? :
>
>   ```perl
>   use PDL;
>   print stretcher( sequence(float, 3) )->type;
>   ```
>
>   I would expect 'float', but it is actually 'double' under PDL v2.007_04.

This is a bug.  One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...


> ## Incremental computation
>
>   I find that the way I grow my code is to slowly add modules that work
>   together in a pipeline. Running and rerunning this code through all the
>   modules is slow. To avoid that, I create multiple small programs that
read
>   and write files to pass from one script to the next. I was looking for a
>   solution and came across IncPy <http://www.pgbovine.net/incpy.html>. It
>   modifies the Python interpreter to support automatic persistent
memoization.
>   I don't think the idea has caught on, but I think it should and perhaps
Perl
>   and PDL is flexible enough to herald it as a CPAN module.

Nice idea for improvement and ease of use.  If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.

Thanks for the thoughts!
Chris

_______________________________________________
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL::Tiny --- what should be in it?

Reply via email to