Contents
0. Introduction
1. A trivial example
2. Typical solutions
3. PDL vs. NetCDF Operators
4. Conclusion
0. Introduction
The second problem I'd like to discuss is different from the
first in that there /are/ known solutions for some particular
cases, yet there aren't, to the best of my knowledge, a single
approach, generic enough to become a part of a generic array
processor, such as PDL.
This problem is of a greater perceived significance, as it may
easily lead to increased development (debugging) time. Also,
since this problem is, at least in part, solved by other
software, it readily makes PDL seem inferior to such software.
The problem is that the dimensions of a PDL variable lack any
information whatsoever on the /meaning/ of the indices.
1. A trivial example
Consider, e. g., that there're two PDL variables, $t1 and $t2,
which contain series of regularly-sampled temperature at two
distinct locations, which we've loaded from some data file or
files. Consider also that we're, for some reason, need to
compute the difference between the temperatures sampled at the
corresponing moments of time. Can it be as simple as, say, the
following?
my $tdiff = $t1 - $t2;
Unfortunately, it can't, as we're yet to be sure that the
corresponding elements of $t1 and $t2 were sampled at the same
time. IOW, we're yet to be sure that the /mapping/ of indices
to the values of a /physical quantity/ (time) is the same for
both of the variables.
2. Typical solutions
How this problem is typically solved? First of all, we need a
way to encode the mapping. In the most common case, the mapping
is assumed to be linear, and thus can be defined as a pair of
scalars: the step, and the offset. Only having ensured that
both are the same for both of the variables, we can proceed with
the computation. Otherwise, we may choose to use subsampling or
interpolation in order to get the mappings to match each other.
For multi-dimensional data, the order of indices also becomes
significant. There're some differences on how this problem is
addressed by the software. In particular, the raster engine of
the GRASS GIS assumes that, roughly speaking, the minor
dimension corresponds to the west to east direction, while the
major one corresponds to the south to north direction. No other
number of dimensions but two is allowed. (Such a solution is
clearly /not/ for a generic array processor, like PDL.)
Many major image formats employ more or less the same solution,
by requiring, e. g., that the inner dimension correspond to the
primary color (red, green or blue), the middle dimension
corresponds to the left to right direction, while the major one
corresponds to the top to bottom direction. (TIFF is among the
notable exceptions, as it allows different layouts.)
NetCDF, a prominent multi-dimensional data format, allows the
individual dimensions to be explicitly named. The software
processing NetCDF files may then choose to /orient/ (i. e.,
permute the dimensions of) the variables involved in a
computation so that their dimensions having the same name will
have matching positions in the list of indices.
Some NetCDF-related materials mention the concept of a
/coordinate variable/ — an one-dimensional variable associated
with the named dimension, which holds the values of some
physical quantity corresponding to the whole range of the index
values. This feature allows for completely arbitrary mappings.
3. PDL vs. NetCDF Operators
The NetCDF Operators (NCO) implement the support for the named
dimensions feature of NetCDF. (And also for the NetCDF Climate
and Forecast (CF) Metadata Conventions, which I'm not yet
familiar with.)
Thus, e. g., the user invoking the following command may be sure
that the right thing is done, irrespective of the internal
layout of the multi-dimensional data that inhabitates the source
datasets:
$ ncbo --op_typ=sub data1.nc data2.nc difference.nc
A mere convenience? Even more so for both the developers and
data providers.
For the first, this behavior means that the software based on
the semantically-aware building blocks like the one above will
not require modification should a data provider suddenly change
the internal layout of a dataset.
For the second, it, conversely, gives more freedom to change the
internal layout as it becomes necessary, without any of: giving
early warnings to the users of the data, providing the data in
both the flavors, or risking losing compatibility.
Unfortunately, reading the contents of a NetCDF variable into a
PDL variable results in the loss of such a semantic information.
Although this information may be read and tracked separately, it
may imply extra burden on the developer, and reduce the
readability of the code, perhaps to the point when it becomes
impractical to pursue layout-independence.
Previously, I've noted that there's a problem with software
relying on some particular ordering of dimensions in the
datasets created by some other software, as both it's limited as
to the datasets it could be applied without modification, and it
also constrains the data provider to the once-created data
layout. In fact, the same reasoning applies to the building
blocks the software is made from: the functions.
Thus, as of the current version of PDL, the order of the
dimensions becomes a part of the function's signature, with all
the (negative) consequences thereof.
4. Conclusion
Now there, my question is: does it seem feasible to add semantic
information to the PDL dimensions?
The mere association of coordinate variables of a kind to the
dimensions of the PDL regular variables shouldn't be hard to
implement. However, the necessity to maintain this information
throughout the computation may imply some extra burden to the
implementations of the PDL functions.
Also, there's a question on how should the behavior be altered
in presence of the semantically-tagged variables? E. g., if the
only dimension of $a is time, and the only dimension of $b is
power, should $a + $b result in a variable having both of these
dimensions? (IOW, should an implicit cross product be
computed?)
TIA.
--
FSF associate member #7257
pgphDgVgx6K9Y.pgp
Description: PGP signature
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
