Even, thanks for the thoughtful answer,
On 16/08/2022 21:57, Even Rouault wrote:
Edzer,
The currently support data types for variables (including indexing
variables of dimensions) or attributes are either numeric (byte, int,
float, etc.), strings or compound data types of previous types. From a
quick thinking, I can't think of strong reasons why OGRGeometry couldn't
be added, baring some changes in GDAL core (looking for specificities of
string support should give good hints where to make changes).
That said, it would probably only used by the netCDF driver, so it is a
bit hard to justify to add a new abstraction for just one user
(user=driver). And implementing that in the netCDF driver wouldn't
necessarily be easy as it is already quite complicated and I'm not sure
how reusable the part that deal with SG geometries is.
I think NetCDF + Zarr; the NetCDF community seems to move to Zarr at
large scale, I have the impression that they keep the CDL / NetCDF data
model - the one abstracted in the MDA API. It looks like CMIP6 will be
largely distributed in Zarr on cloud storage forms, see
https://pangeo-data.github.io/pangeo-cmip6-cloud/overview.html. I would
be surprised if the Zarr folks would come with a definition for
geometries different from CF if CF gets used; building this support in
GDAL would be an incentive for them to not do so.
On your examples,
it is not immediately obvious to me how the driver would expose a
OGRGeometry data type. In the first one, it would have to synthesize a
t2_coord variable from t2_coordX and t2_coordY ? And on the second one a
xy pseudo variable from x and y ? But it would be a bit convoluted for
the users of the multi-dimensional indexed Precipitation to figure out
that stations has a variable indexed by stations that has geometries.
Geometry sets can indeed best be seen as synthetic variables that are
associated with a dimension (t2_node_coordinates in the first example of
my previous email, station in the second).
To find the OGRGeometry dimension,
- look for a variable v that has node_coordinates as attribute
- if v:geometry_type == "point", then
- look for a variable w that has 'w:axis = "X"' as attribute
- the dimension of w is the OGRGeometry dimension
- if v:geometry_type == "polygon" or "line", then
- look for the variable mentioned in v:node_count; the dimension
associated with this variable is the OGRGeometry dimension
GetOGRGeometries() could be a member function of GDALDimension,
returning NULL if no CF-compliant geometries can be constructed
associated with the dimension.
At the end of this email are the two same examples but now with polygons
rather than points.
(in the second example I posted earlier a 'geometry:node_coordinates =
"x y";' is missing)
So all in all, I believe it should be doable but the ratio
benefit/effort doesn't seem to me to be super favorable.
I see your point, for me it's a chicken and egg problem. How do we now
handle the output when querying a data cube with monthly temperature
projections for the next 30 years for a set of climate models and a set
of scenarios at a set of given point locations, or get the maximum
temperature over a set of given regions? I think there's a big gap now
between the GIS community and the modelling communities; your MDA API
helps closing it, and we can do better.
I'll talk more about this next week Thu at our FOSS4G presentation, and
hope to meet you there IRL!
Many regards,
Even
Le 16/08/2022 à 12:47, Edzer Pebesma a écrit :
I've been using GDAL's C++ multidimensional array (MDA) API lately to
read and write data cubes from R - excellent work! I was looking into
support for vector data cubes, multidimensional arrays with a single
dimension associated with a set of geometries.
What we can do is read and write dimensions, variables, and
attributes, but what (as far as I can tell) is missing is to read and
write OGRGeometry variables, using the CF specification for geometry
[1], which is supported by the NetCDF vector driver. Would it be
feasible to read and write CF-compliant geometries from
multidimensional arrays through the MDA API? Right now software
writers have to sort this out themselves, which is straightforward for
points but cludgy for polygons. Another issue is crs handling which I
think could be done gracefully using the OGRGeometry interface.
As a minimal data example: I added two ncdump files below for two time
instances and two POINT geometries. The first can be read and written
by the OGR API, but is a one-dimensional array that lacks the time
information of time series (time distributed over attributes/columns).
The second is a 2 x 2 multidimensional array with the time
information. GDAL's MDA API can read it but doesn't recognize
geometries, GDAL's OGR API cannot read it (obviously). I'd like to be
able to read (and write) the station dimension of the second one as an
OGRGeometry, for points, lines or polygons.
[1]
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#geometries
--
Edzer Pebesma
Institute for Geoinformatics
Heisenbergstrasse 2, 48151 Muenster, Germany
Phone: +49 251 8333081
First, using OGR:
netcdf t2 {
dimensions:
t2_node_coordinates = 13 ;
t2_node_count = 2 ;
t2_part_node_count = 3 ;
variables:
float t2 ;
t2:geometry_type = "polygon" ;
t2:node_coordinates = "t2_coordX t2_coordY" ;
t2:node_count = "t2_node_count" ;
t2:part_node_count = "t2_part_node_count" ;
t2:interior_ring = "t2_interior_ring" ;
int t2_node_count(t2_node_count) ;
int t2_part_node_count(t2_part_node_count) ;
int t2_interior_ring(t2_part_node_count) ;
double t2_coordX(t2_node_coordinates) ;
t2_coordX:axis = "X" ;
double t2_coordY(t2_node_coordinates) ;
t2_coordY:axis = "Y" ;
double t2_field_t1(t2_node_count) ;
t2_field_t1:long_name = "Field t1" ;
t2_field_t1:geometry = "t2" ;
t2_field_t1:ogr_field_name = "t1" ;
t2_field_t1:ogr_field_type = "Real" ;
double t2_field_t2(t2_node_count) ;
t2_field_t2:long_name = "Field t2" ;
t2_field_t2:geometry = "t2" ;
t2_field_t2:ogr_field_name = "t2" ;
t2_field_t2:ogr_field_type = "Real" ;
// global attributes:
:Conventions = "CF-1.8" ;
:GDAL = "GDAL 3.4.3, released 2022/04/22" ;
:history = "Wed Aug 17 15:59:23 2022: GDAL Create( t2.nc, ...
)" ;
data:
t2 = _ ;
t2_node_count = 8, 5 ;
t2_part_node_count = 4, 4, 5 ;
t2_interior_ring = 0, 1, 0 ;
t2_coordX = 0, 1, 1, 0, 0.5, 0.7, 0.7, 0.5, 3, 4, 4, 3, 3 ;
t2_coordY = 0, 0, 1, 0, 0.5, 0.5, 0.7, 0.5, 3, 3, 4, 4, 3 ;
t2_field_t1 = 0.33, 0.32 ;
t2_field_t2 = 0.04, 0.4 ;
}
Second: using MDA
netcdf t1 {
dimensions:
part = 3 ;
node = 13 ;
time = 2 ;
stations = 2 ;
variables:
double Precipitation(time, stations) ;
Precipitation:coordinates = "x y" ;
double stations(stations) ;
double time(time) ;
time:units = "days since 1970-01-01" ;
double node(node) ;
double x(node) ;
x:axis = "X" ;
double y(node) ;
y:axis = "Y" ;
double node_count(stations) ;
double part_node_count(part) ;
double interior_ring(part) ;
double geometry ;
geometry:geometry_type = "polygon" ;
geometry:node_count = "node_count" ;
geometry:node_coordinates = "x y" ;
geometry:part_node_count = "part_node_count" ;
geometry:interior_ring = "interior_ring" ;
// global attributes:
:Conventions = "CF-1.6" ;
data:
Precipitation =
0.33, 0.04,
0.32, 0.4 ;
stations = 1, 2 ;
time = 19114, 19115 ;
node = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ;
x = 0, 1, 1, 0, 0.5, 0.7, 0.7, 0.5, 3, 4, 4, 3, 3 ;
y = 0, 0, 1, 0, 0.5, 0.5, 0.7, 0.5, 3, 3, 4, 4, 3 ;
node_count = 8, 5 ;
part_node_count = 4, 4, 5 ;
interior_ring = 0, 1, 0 ;
geometry = _ ;
}
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev