Re: [Perldl] [Pdl-porters] need use cases for new PDL datatypes

David Mertens Wed, 12 Jun 2013 08:42:35 -0700

Here's an example of what I have in mind, in very rough form

$pdl_svs = pdl_sv(3);  # note distinct constructor


$pdl_svs(0) .= My::Class->new;
$pdl_svs(1) .= Other::Class->new;
$pdl_svs(2) .= My::ThirdClass->new;

# Call the printout method on all three SVs
$pdl_svs->call_SV_method('printout');

# Equivalent code, probably faster:
for my $obj (My::Class->new, Other::Class->new, My::ThirdClass->new) {
$obj->printout }

Obviously, if you tried calling "call_SV_method" on a double piddle, it
would have to croak.

...

Actually, the more that I think about it, I feel like an SV type would make
a lot more sense as a type derived from a generic Pointer type. PDL::SV
would inherit from PDL and would use the pointer type exclusively. It could
also include casting to PDL types. At any rate, calling a method on these
SVs would require a full Perl function invocation, including preparing the
stack, fetching the method, invoking the method, trapping errors, etc. It'd
be slow as molasses FOR THIS DATATYPE, but the other datatypes would work
just fine, and it would make PDL a far more generic data container.

OK, sorry, I'm ranting. I should turn my attention to other things. I'm
still recovering from a cold, and my ideas feel a bit scattered from the
meds.

David


On Wed, Jun 12, 2013 at 10:10 AM, Craig DeForest
<[email protected]>wrote:

> Sounds interesting, David.
>
> If I understand right, you're asking for a container type that:
>
>  - implements its own threading/retrieval hooks
>  - can in principle encapsulate anything that Perl can handle
>  - has some sort of numerical-index structure to it to follow the Perl
> indices
>
> That's pretty general, and looks to be sort of like what Larry was talking
> about in Synopsis 9 (http://perlcabal.org/syn/S09.html#PDL_support) some
> time ago.  "fast looping over your SVs" seems difficult to me, since Perl
> loops are already pretty darned fast for what they are.  There seem to be
> two ways this could go, which would be helpful:
>
> (1) generalize current PDL calling conventions to SVs/lists, basically as
> syntactic sugar for thread_define() -- convenient but not fast (compared to
> PDL2);
> (2) generalize current PDL under-the-hood threading to enable multiple
> types of thread retrieval, such that PDL2 style calls with PDLs work the
> way they do now, but other things (such as database retrievals, etc.)
> benefit from batch operation the way PDLs do now.
>
> It seems to me that (1) is a reasonable goal for PDL3 (it could work on
> top of the existing threading engine), but that generalizing it too much
> might be a waste of effort -- after all, with perl-level looping it can
> never get particularly fast.  On the other hand, (2) is a far larger
> problem that would require formalizing the internal API more than has been
> done, and could require a complete overhaul of the threading engine.
>
>
>
>
>
>
> On Jun 12, 2013, at 8:56 AM, David Mertens <[email protected]>
> wrote:
>
> The idea I had in mind was to make a PDL_SV that could hold a reference to
> any type, not just array-like types. It would basically enable fast looping
> over your SVs. This would turn PDL into a general N-dimensional data
> container. Limiting the PDL_SV type only to references to arrays is not as
> general as I would like.
>
> As for allowing blessed arrays to masquerade as PDL objects, I think
> that's tangential to the discussion. Cool idea, and probably a very
> difficult one, but not necessary from the standpoint of PDL types.
>
> David
>
>
> On Tue, Jun 11, 2013 at 12:53 PM, Craig DeForest <
> [email protected]> wrote:
>
>> I imagine being able to bless an array ref to being a PDL, and have it
>> autodetected, just like hashes are autodetected now...?
>>
>> On Jun 11, 2013, at 11:52 AM, Chris Marshall <[email protected]>
>> wrote:
>>
>> > Would it need to be specifically a PDL?
>> > I could imagine arrays of objects of any style
>> > that could fit that framework.  Of course, a
>> > specialization for the PDL case would be a
>> > common optimization.
>> >
>> > This would actually be very nice as a way
>> > to apply PDL threading and looping to perl
>> > level constructs---maybe both directions.
>> >
>> > --Chris
>> >
>> > On Tue, Jun 11, 2013 at 1:44 PM, Craig DeForest
>> > <[email protected]> wrote:
>> >> I think this is a nifty idea, if it can be made to interoperate with
>> the
>> >> rest of the threading engine without slowing everything down.  Maybe
>> the SV
>> >> * type should operate only on array refs...?
>> >>
>> >> The only way I can see it working is if we make it "superior" to PDL_D
>> in
>> >> the type hierarchy, so everything is promoted if possible (or demoted
>> if
>> >> necessary) when a PDL_SV is encountered.  Selection operations would
>> require
>> >> some thought - basically a new case for each of slicing, dicing, and
>> >> indexing.
>> >>
>> >>
>> >>
>> >> On Jun 11, 2013, at 11:31 AM, David Mertens <[email protected]>
>> >> wrote:
>> >>
>> >> I would really like to have an SV* PDL type, as well as a PDL method
>> >> 'invoke' that would call a supplied function reference or method name
>> on
>> >> each SV*. Said method would croak on all but SV* types, of course.
>> >>
>> >> Obviously, any operations on this would be really slow compared to
>> bare C
>> >> types, but they could also be far more diverse. I could easily write a
>> >> PDL::Drawing::Prima method that could take x, y, and (possibly utf-8)
>> >> strings and thread over coordinates and strings correctly. With
>> PDL::Char, I
>> >> only get ASCII, and all the strings are allocated to the longest needed
>> >> string, which is not quite what I would like.
>> >>
>> >> It would also be possible to have each SV* be a reference to another
>> PDL
>> >> with arbitrary shape, allowing Edward to achieve what he wants.
>> >>
>> >> David
>> >>
>> >>
>> >> On Tue, May 28, 2013 at 2:57 AM, Edward Baudrez <
>> [email protected]>
>> >> wrote:
>> >>>
>> >>> On Mon, May 27, 2013 at 4:28 PM, Chris Marshall <
>> [email protected]>
>> >>> wrote:
>> >>>> For example, from the PDL::IO::HDF5 README:
>> >>>>
>> >>>>     This package provides an object-oriented interface for the
>> >>>>     PDL package to the HDF5 data-format. Information on the
>> >>>>     HDF5 Format can be found at the NCSA's web site at
>> >>>>     http://hdf.ncsa.uiuc.edu/ .
>> >>>>
>> >>>>     LIMITATIONS
>> >>>>
>> >>>>     Currently this interface only provides a subset of the total
>> >>>>     HDF5 library's capability.
>> >>>>
>> >>>>    o Only HDF5 Simple datatypes are supported. No HDF5 Compound
>> >>>>      datatypes are supported since PDL doesn't support them.
>> >>>>
>> >>>>    o Only HDF5 Simple dataspaces are supported.
>> >>>>
>> >>>> So clearly, PDL has a need for <something new>.  :-)
>> >>>> Your responses will help to prioritize and select the
>> >>>> implementation of this feature for PDL3.
>> >>>>
>> >>>> Thanks in advance for your replies.
>> >>>> Chris Marshall
>> >>>> PDL-3.000 release manager
>> >>>
>> >>> Hi
>> >>>
>> >>>
>> >>> I am sorry I haven't spoken up earlier, but I do have an idea for a
>> 'new'
>> >>> data type that you may want to consider. As you may know, I wrote a
>> >>> multidimensional binning/histogramming library for PDL
>> >>> (https://metacpan.org/module/PDL::NDBin). The idea is very simple:
>> data
>> >>> points are classified into (fixed-width) bins, much like histogram().
>> My
>> >>> library also allows arbitrary callbacks on the data, so that, after
>> >>> classification into the bins, you can perform any kind of computation
>> on the
>> >>> data values inside the bins (not just counting them, like
>> histogram()). I
>> >>> use it, for example, to classify satellite data collected all over
>> the globe
>> >>> in latitude/longitude boxes, and calculate mean and standard deviation
>> >>> inside every latitude/longitude box.
>> >>>
>> >>> The actions I've implemented so far (count, sum, average, standard
>> >>> deviation, minimum and maximum) are all reductions, so I end up with
>> one
>> >>> value per bin. The final value for all the bins are grouped into a
>> piddle of
>> >>> one of the standard types. I need this functionality to handle
>> >>> multidimensional binning with an algorithm that is essentially
>> >>> one-dimensional (i.e., I use reshape() to convert the internal,
>> >>> one-dimensional piddle holding all the return values from the bins
>> into an
>> >>> N-dimensional piddle).
>> >>>
>> >>> But now I am thinking of creating an action that would collect the
>> data
>> >>> values in the bins (this would be useful for plotting or regression).
>> >>> Obviously the number of data values per bin would be different for
>> all the
>> >>> bins. So if there was a data type in PDL that would essentially hold
>> other
>> >>> piddles, instead of raw C data values, that would be very convenient.
>> (A
>> >>> piddle containing piddles).
>> >>>
>> >>> I admit that I haven't thought through this very thoroughly. It may
>> even
>> >>> be infeasible. But you asked for suggestions ;-)
>> >>>
>> >>> I imagine the above may not be very clear. Let me know if you want
>> more
>> >>> information.
>> >>>
>> >>>
>> >>>
>> >>> Best regards
>> >>> & All my sympathy for the continuing development of PDL - Much
>> >>> appreciated!
>> >>>
>> >>> Edward
>> >>>
>> >>> _______________________________________________
>> >>> PDL-porters mailing list
>> >>> [email protected]
>> >>> http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> "Debugging is twice as hard as writing the code in the first place.
>> >>  Therefore, if you write the code as cleverly as possible, you are,
>> >>  by definition, not smart enough to debug it." -- Brian Kernighan
>> >> _______________________________________________
>> >> PDL-porters mailing list
>> >> [email protected]
>> >> http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters
>> >>
>> >>
>> >
>>
>>
>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>
>
>


-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] [Pdl-porters] need use cases for new PDL datatypes

Reply via email to