Re: [Perldl] PDL::Tiny --- what should be in it?

Chris Marshall Mon, 15 Dec 2014 11:47:26 -0800

Agreed.

The need to avoid cache-busting code and poor performance is one motivation
for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers.  Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...


--Chris

On Mon, Dec 15, 2014 at 2:00 PM, David Mertens <dcmertens.p...@gmail.com>
wrote:
>
> Something that I think will be critical, especially if we start
> JIT-compiling stuff or allowing for subclassing, is the customized code
> could lead to a performance hit if it leads to code cache misses. I
> recently came across a great explanation here:
> http://igoro.com/archive/gallery-of-processor-cache-effects/
>
> One of the files in the Perl interpreter's core code is called pp_hot.c.
> According to comments at the top of the file, these functions are
> consolidated into a single c (and later object) file to "encourage CPU
> cache hits on hot code." If we create more and more code paths that get
> executed, we increase the time spent loading the machine code into the L1
> cache, and we also increase the likelihood of evicting parts of pp_hot and
> other important execution paths.
>
> David
>
> On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <dcmertens.p...@gmail.com>
> wrote:
>>
>> FWIW, it looks like Julia views are like affine slices in PDL. As I have
>> said before, almost nothing out there has the equivalent of non-contiguous,
>> non-strided support like we get with which, where, and their ilk. GSL
>> vectors do not, either. Matlab only supports it as a temporary object, and
>> eliminates it after the line has executed. Not sure about Numpy here.
>>
>> David
>>
>> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <devel.chm...@gmail.com>
>> wrote:
>>
>>> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
>>> zaki.mug...@gmail.com> wrote:
>>> >
>>> > ...snip...
>>> >
>>> > ## Levels of measurement
>>> >
>>> >   When using R, one of the nice things it does is warn or give
>>> >   an error when you try to do an operation that would be invalid on a
>>> certain
>>> >   type of data. One such type of data is categorical data, which R
>>> calls
>>> >   factors and for which I made a subclass of PDL called PDL::Factor.
>>> Some of
>>> >   this behvaviour is inspired by the statistical methodology of levels
>>> of
>>> >   measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
>>> believe
>>> >   SAS even explicitly allows assigning levels of measurment to
>>> variables.
>>>
>>> +1, it would be nice if new PDL types supported varying
>>> levels of computation including by levels of measurement
>>>
>>> > ...snip...
>>> >
>>> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes
>>> sense for
>>> >   categorical data. For logical vectors, it does something else:
>>>
>>> I would like to see more generalized support for bad value computions
>>> since in some cases BAD is used for missing, in others BAD is used
>>> for invalid,...
>>>
>>> > ...snip...
>>> >
>>> >   Thinking in terms of levels of measurement can help with another
>>> experiment
>>> >   I'm doing which based around tracking the units of measure used for
>>> numerical
>>> >   things in Perl. Code is here <
>>> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>>> >.
>>> >
>>> >   What I do there is use Moo roles to add a unit attribute to
>>> numerical types
>>> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
>>> through an
>>> >   operation by either operator overloading or calling a function such
>>> as
>>> >   `sum()`, the unit will be carried along with it and be manipulated
>>> >   appropriately (you can take the mean of Kelvin, but not degrees
>>> Celsius). I
>>> >   know that units of measure are messy to implement, but being able to
>>> support
>>> >   auxiliary operations like this will go a long way to making PDL
>>> flexible.
>>>
>>> Yes!  The use of method modifiers offer some powerful development
>>> tools to implement various high level features.  I'm hoping that
>>> it can be used to augment core functionality to support many of
>>> the more powerful or flexible features such as JIT compiling, GPU
>>> computation, distributed computation,...
>>> >
>>> >   [Has anyone used udunits2? I made an Alien package for it. It's on
>>> CPAN.]
>>> >
>>> > ## DataShape and Blaze
>>>
>>> This looks a lot like what the PDL::Tiny core is shaping up to be.
>>> Another goal of PDL::Tiny is flexibility so that PDL can use and
>>> be used by/from other languages.
>>>
>>> >   I think it would be beneficial to look at the work being done by the
>>> Blaze
>>> >   project <http://blaze.pydata.org/> with its DataShape specification
>>> >   <http://datashape.pydata.org/>. The idea behind it is to be able to
>>> use the
>>> >   various array-like APIs without having to worry what is going on in
>>> the
>>> >   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL
>>> server.
>>> >
>>> > ## Julia
>>> >
>>> >   Julia has been doing some amazing things with how they've grown out
>>> their
>>> >   language. I was looking to see if they have anything similar to the
>>> dataflow
>>> >   in PDL and I came across ArrayViews <
>>> https://github.com/JuliaLang/ArrayViews.jl>.
>>> >   It may be enlightening to see how they compose this feature onto
>>> already
>>> >   existing n-d arrays as opposed to how PDL does it.
>>> >
>>> >   I do not know what tradeoffs that brings, but it is a starting point
>>> to think
>>> >   about. I think similar approaches can be made to support sparse
>>> arrays.
>>>
>>> Julia views look a lot like what we call slices.
>>>
>>> >   In fact, one of Julia's strengths is how they use multimethods to
>>> handle new
>>> >   types with ease. See "The Design Impact of Multiple Dispatch"
>>> >   <
>>> http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
>>> >   for examples. [Perl 6 has built-in multimethods]
>>>
>>> Multi-methods may be a good way to support some of the new PDL
>>> capabilities in a way that can be expanded by plugins, at runtime,
>>> ...
>>>
>>>
>>> > ## MATLAB subclassing
>>> >
>>> > ...snip...
>>> >
>>> > ## GPU and threading
>>> >
>>> >   I think it would be best to offload GPU support to other libraries,
>>> so it
>>> >   would be good to extract what is common between libraries like
>>> >
>>> >   - MAGMA <http://icl.cs.utk.edu/magma/>,
>>> >   - ViennaCL <http://viennacl.sourceforge.net/>,
>>> >   - Blaze-lib  <https://code.google.com/p/blaze-lib/>,
>>> >   - VXL <http://vxl.sourceforge.net/>,
>>> >   - Spark <http://spark.apache.org/>,
>>> >   - Torch <http://torch.ch/>,
>>> >   - Theano <http://www.deeplearning.net/software/theano/>,
>>> >   - Eigen <http://eigen.tuxfamily.org/>, and
>>> >   - Armadillo <http://arma.sourceforge.net/>.
>>> >
>>> >   Eigen is interesting in particular because it has support for
>>> storing in both
>>> >   row-major and column-major data <
>>> http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
>>>
>>> We would benefit by supporting the commonalities needed to work
>>> with other GPU computation libraries.  I'm not sure that all
>>> PDL computations can be run efficiently if processed at the
>>> library call level.  We may want our own JIT for performnce.
>>>
>>> >   Another source of inspiration would be the VSIPL spec <
>>> http://www.omgwiki.org/hpec/vsipl>.
>>> >   It's a standard made for signal processing routines in the embedded
>>> DSP world
>>> >   and comes with "Core" and "Core Lite" profiles which might help
>>> decide what
>>> >   should be included in a smaller subset of PDL.
>>> >
>>> >   Also in my wishlist is interoperability with libraries like ITK <
>>> http://www.itk.org/>,
>>> >   VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They
>>> have
>>> >   interesting architectures especially for computation. Unfortunately,
>>> the
>>> >   first two are C++ based and I don't have experience with combining
>>> C++ and XS.
>>>
>>> Thanks for all the references and ideas!
>>>
>>> > ## Better testing
>>> >
>>> >   PDL should make more guarantees about how types flow through the
>>> system. This
>>> >   might be accomplished by adding assertions in the style of
>>> Design-by-Contract
>>> >   which can act as both a testable spec and documentation. I'm working
>>> on the
>>> >   test suite right now on a branch and I hope to create a
>>> proof-of-concept of
>>> >   this soon.
>>>
>>> I think starting with the PDL::Tiny core and building out we could
>>> clarify some of these issues.
>>> >
>>> >   I hope that this can help make PDL more consistent and easily
>>> testable. There
>>> >   are still small inconsistencies that shouldn't be there which can be
>>> weeded out
>>> >   with testing. For example, what type is expected for this code? :
>>> >
>>> >   ```perl
>>> >   use PDL;
>>> >   print stretcher( sequence(float, 3) )->type;
>>> >   ```
>>> >
>>> >   I would expect 'float', but it is actually 'double' under PDL
>>> v2.007_04.
>>>
>>> This is a bug.  One thing that would be nice to have is
>>> a way to trace the dataflow characteristics through the
>>> PDL processing chains...
>>>
>>>
>>> > ## Incremental computation
>>> >
>>> >   I find that the way I grow my code is to slowly add modules that work
>>> >   together in a pipeline. Running and rerunning this code through all
>>> the
>>> >   modules is slow. To avoid that, I create multiple small programs
>>> that read
>>> >   and write files to pass from one script to the next. I was looking
>>> for a
>>> >   solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
>>> It
>>> >   modifies the Python interpreter to support automatic persistent
>>> memoization.
>>> >   I don't think the idea has caught on, but I think it should and
>>> perhaps Perl
>>> >   and PDL is flexible enough to herald it as a CPAN module.
>>>
>>> Nice idea for improvement and ease of use.  If PDL methods are
>>> implemented compatible with Moo[se] then method modifiers could
>>> be used for this.
>>>
>>> Thanks for the thoughts!
>>> Chris
>>>
>>>
>>> _______________________________________________
>>> Perldl mailing list
>>> Perldl@jach.hawaii.edu
>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>>
>>>
>>
>> --
>>  "Debugging is twice as hard as writing the code in the first place.
>>   Therefore, if you write the code as cleverly as possible, you are,
>>   by definition, not smart enough to debug it." -- Brian Kernighan
>>
>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>

_______________________________________________
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL::Tiny --- what should be in it?

Reply via email to