On May 28, 2013 3:58 AM, "Edward Baudrez" <[email protected]> wrote: > > On Mon, May 27, 2013 at 4:28 PM, Chris Marshall <[email protected]> wrote: > > For example, from the PDL::IO::HDF5 README: > > > > This package provides an object-oriented interface for the > > PDL package to the HDF5 data-format. Information on the > > HDF5 Format can be found at the NCSA's web site at > > http://hdf.ncsa.uiuc.edu/ . > > > > LIMITATIONS > > > > Currently this interface only provides a subset of the total > > HDF5 library's capability. > > > > o Only HDF5 Simple datatypes are supported. No HDF5 Compound > > datatypes are supported since PDL doesn't support them. > > > > o Only HDF5 Simple dataspaces are supported. > > > > So clearly, PDL has a need for <something new>. :-) > > Your responses will help to prioritize and select the > > implementation of this feature for PDL3. > > > > Thanks in advance for your replies. > > Chris Marshall > > PDL-3.000 release manager > > Hi > > > I am sorry I haven't spoken up earlier, but I do have an idea for a 'new' data type that you may want to consider. As you may know, I wrote a multidimensional binning/histogramming library for PDL ( https://metacpan.org/module/PDL::NDBin). The idea is very simple: data points are classified into (fixed-width) bins, much like histogram(). My library also allows arbitrary callbacks on the data, so that, after classification into the bins, you can perform any kind of computation on the data values inside the bins (not just counting them, like histogram()). I use it, for example, to classify satellite data collected all over the globe in latitude/longitude boxes, and calculate mean and standard deviation inside every latitude/longitude box. > > The actions I've implemented so far (count, sum, average, standard deviation, minimum and maximum) are all reductions, so I end up with one value per bin. The final value for all the bins are grouped into a piddle of one of the standard types. I need this functionality to handle multidimensional binning with an algorithm that is essentially one-dimensional (i.e., I use reshape() to convert the internal, one-dimensional piddle holding all the return values from the bins into an N-dimensional piddle). > > But now I am thinking of creating an action that would collect the data values in the bins (this would be useful for plotting or regression). Obviously the number of data values per bin would be different for all the bins. So if there was a data type in PDL that would essentially hold other piddles, instead of raw C data values, that would be very convenient. (A piddle containing piddles). > > I admit that I haven't thought through this very thoroughly. It may even be infeasible. But you asked for suggestions ;-) > > I imagine the above may not be very clear. Let me know if you want more information. > > > > Best regards > & All my sympathy for the continuing development of PDL - Much appreciated! > > Edward > > _______________________________________________ > PDL-porters mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters >
You could also look at the idea of supporting a concept like the data frame of R/pandas. Doug
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
