Re: [gdal-dev] reading/writing geometries using GDAL's MDA API

2022-08-17 Thread Edzer Pebesma

Even, thanks for the thoughtful answer,

On 16/08/2022 21:57, Even Rouault wrote:

Edzer,

The currently support data types for variables (including indexing 
variables of dimensions) or attributes are either numeric (byte, int, 
float, etc.), strings or compound data types of previous types. From a 
quick thinking, I can't think of strong reasons why OGRGeometry couldn't 
be added, baring some changes in GDAL core (looking for specificities of 
string support should give good hints where to make changes).


That said, it would probably only used by the netCDF driver, so it is a 
bit hard to justify to add a new abstraction for just one user 
(user=driver). And implementing that in the netCDF driver wouldn't 
necessarily be easy as it is already quite complicated and I'm not sure 
how reusable the part that deal with SG geometries is. 


I think NetCDF + Zarr; the NetCDF community seems to move to Zarr at 
large scale, I have the impression that they keep the CDL / NetCDF data 
model - the one abstracted in the MDA API. It looks like CMIP6 will be 
largely distributed in Zarr on cloud storage forms, see 
https://pangeo-data.github.io/pangeo-cmip6-cloud/overview.html. I would 
be surprised if the Zarr folks would come with a definition for 
geometries different from CF if CF gets used; building this support in 
GDAL would be an incentive for them to not do so.


On your examples, 
it is not immediately obvious to me how the driver would expose a 
OGRGeometry data type. In the first one, it would have to synthesize a 
t2_coord variable from t2_coordX and t2_coordY ? And on the second one a 
xy pseudo variable from x and y ? But it would be a bit convoluted for 
the users of the multi-dimensional indexed Precipitation to figure out 
that stations has a variable indexed by stations that has geometries.


Geometry sets can indeed best be seen as synthetic variables that are 
associated with a dimension (t2_node_coordinates in the first example of 
my previous email, station in the second).


To find the OGRGeometry dimension,
- look for a variable v that has node_coordinates as attribute
- if v:geometry_type == "point", then
   - look for a variable w that has 'w:axis = "X"' as attribute
   - the dimension of w is the OGRGeometry dimension
- if v:geometry_type == "polygon" or "line", then
   - look for the variable mentioned in v:node_count; the dimension
 associated with this variable is the OGRGeometry dimension

GetOGRGeometries() could be a member function of GDALDimension, 
returning NULL if no CF-compliant geometries can be constructed 
associated with the dimension.


At the end of this email are the two same examples but now with polygons 
rather than points.


(in the second example I posted earlier a 'geometry:node_coordinates = 
"x y";' is missing)




So all in all, I believe it should be doable but the ratio 
benefit/effort doesn't seem to me to be super favorable.


I see your point, for me it's a chicken and egg problem. How do we now 
handle the output when querying a data cube with monthly temperature 
projections for the next 30 years for a set of climate models and a set 
of scenarios at a set of given point locations, or get the maximum 
temperature over a set of given regions? I think there's a big gap now 
between the GIS community and the modelling communities; your MDA API 
helps closing it, and we can do better.


I'll talk more about this next week Thu at our FOSS4G presentation, and 
hope to meet you there IRL!


Many regards,



Even

Le 16/08/2022 à 12:47, Edzer Pebesma a écrit :
I've been using GDAL's C++ multidimensional array (MDA) API lately to 
read and write data cubes from R - excellent work! I was looking into 
support for vector data cubes, multidimensional arrays with a single 
dimension associated with a set of geometries.


What we can do is read and write dimensions, variables, and 
attributes, but what (as far as I can tell) is missing is to read and 
write OGRGeometry variables, using the CF specification for geometry 
[1], which is supported by the NetCDF vector driver. Would it be 
feasible to read and write CF-compliant geometries from 
multidimensional arrays through the MDA API? Right now software 
writers have to sort this out themselves, which is straightforward for 
points but cludgy for polygons. Another issue is crs handling which I 
think could be done gracefully using the OGRGeometry interface.


As a minimal data example: I added two ncdump files below for two time 
instances and two POINT geometries. The first can be read and written 
by the OGR API, but is a one-dimensional array that lacks the time 
information of time series (time distributed over attributes/columns). 
The second is a 2 x 2 multidimensional array with the time 
information. GDAL's MDA API can read it but doesn't recognize 
geometries, GDAL's OGR API cannot read it (obviously). I'd like to be 
able to read (and write) the station dimension of the second one as an 
OGRGeometry, 

Re: [gdal-dev] reading/writing geometries using GDAL's MDA API

2022-08-16 Thread Even Rouault

Edzer,

The currently support data types for variables (including indexing 
variables of dimensions) or attributes are either numeric (byte, int, 
float, etc.), strings or compound data types of previous types. From a 
quick thinking, I can't think of strong reasons why OGRGeometry couldn't 
be added, baring some changes in GDAL core (looking for specificities of 
string support should give good hints where to make changes).


That said, it would probably only used by the netCDF driver, so it is a 
bit hard to justify to add a new abstraction for just one user 
(user=driver). And implementing that in the netCDF driver wouldn't 
necessarily be easy as it is already quite complicated and I'm not sure 
how reusable the part that deal with SG geometries is. On your examples, 
it is not immediately obvious to me how the driver would expose a 
OGRGeometry data type. In the first one, it would have to synthesize a 
t2_coord variable from t2_coordX and t2_coordY ? And on the second one a 
xy pseudo variable from x and y ? But it would be a bit convoluted for 
the users of the multi-dimensional indexed Precipitation to figure out 
that stations has a variable indexed by stations that has geometries.


So all in all, I believe it should be doable but the ratio 
benefit/effort doesn't seem to me to be super favorable.


Even

Le 16/08/2022 à 12:47, Edzer Pebesma a écrit :
I've been using GDAL's C++ multidimensional array (MDA) API lately to 
read and write data cubes from R - excellent work! I was looking into 
support for vector data cubes, multidimensional arrays with a single 
dimension associated with a set of geometries.


What we can do is read and write dimensions, variables, and 
attributes, but what (as far as I can tell) is missing is to read and 
write OGRGeometry variables, using the CF specification for geometry 
[1], which is supported by the NetCDF vector driver. Would it be 
feasible to read and write CF-compliant geometries from 
multidimensional arrays through the MDA API? Right now software 
writers have to sort this out themselves, which is straightforward for 
points but cludgy for polygons. Another issue is crs handling which I 
think could be done gracefully using the OGRGeometry interface.


As a minimal data example: I added two ncdump files below for two time 
instances and two POINT geometries. The first can be read and written 
by the OGR API, but is a one-dimensional array that lacks the time 
information of time series (time distributed over attributes/columns). 
The second is a 2 x 2 multidimensional array with the time 
information. GDAL's MDA API can read it but doesn't recognize 
geometries, GDAL's OGR API cannot read it (obviously). I'd like to be 
able to read (and write) the station dimension of the second one as an 
OGRGeometry, for points, lines or polygons.


[1] 
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#geometries


--
http://www.spatialys.com
My software is free, but my time generally not.

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev