Re: [gdal-dev] HDF5 and identified fields / primary dimension
> For that particular file, I see that the "feature_id" variable > (corresponding to the "feature_id" dimension) has a cf_role = > "timeseries_id" attribute, and that the global metadata has a > featureType = "timeSeries" attribute. So given > > https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#coordinates-metadata > , this seems to be relatively standardized, and in that case the > heuristics could be improve to recognize that the main dimension is > feature_id (probably with a test that the size of the time dimension is > 1). As far as I can see/remember, the vector layer support in netCDF > was originally developed for the featureType=point and profile use cases > , so some tuning for timeseries isn't unexpected > > Thanks! I've made *some* progress, the deepest I've been down in that file ... I hope to be able to craft these suggestions at some point. Cheers, Mike > Even > > -- > http://www.spatialys.com > My software is free, but my time generally not. > > -- Michael Sumner Software and Database Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] HDF5 and identified fields / primary dimension
Michael, Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored. Yes, the driver is super conservative/picky when trying to recognize a netCDF file as a vector layer, and its heuristics will return in error if there is any ambiguity. I've seen similar cases in other files. I presume the driver could be updated to 1) choose the primary dimension and read the values while ignore others 2) user-specify the dimension to include, or 3) user-specify the fields to exclude I guess option 2 could be reasonable as an open option For that particular file, I see that the "feature_id" variable (corresponding to the "feature_id" dimension) has a cf_role = "timeseries_id" attribute, and that the global metadata has a featureType = "timeSeries" attribute. So given https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#coordinates-metadata , this seems to be relatively standardized, and in that case the heuristics could be improve to recognize that the main dimension is feature_id (probably with a test that the size of the time dimension is 1). As far as I can see/remember, the vector layer support in netCDF was originally developed for the featureType=point and profile use cases , so some tuning for timeseries isn't unexpected Or maybe if detecting that in the set of dimensions there is only one with > 1 sample and others ones are at 1, consider only the one with > 1 sample Even -- http://www.spatialys.com My software is free, but my time generally not. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] HDF5 and identified fields / primary dimension
well actually, I think what I'm asking for is the intended behaviour, but there's an error. Is it meant to detect sets of variables on 1D dimensions and present them as layers? That's what would make sense to me. Still exploring. Cheers, Mike On Tue, Apr 2, 2024 at 5:36 AM Michael Sumner wrote: > This source has an array on 'feature_id' with 2729077 values, with various > fields > > elevation, longitude, latitude, qBtmVertRunoff, qBucket, etc > > > '/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp' > > It is accessible via the mdim api. > > Structurally it is basically a table with rows per feature_id and columns > per fields, but it has a length-1 pair of fields "time" and > "reference_time" defined on dimension time, this is like a single time step > per file (like an unlimited dimension in the classic 2D case). > > Accessing with the vector API reports that it can't treat this as a table > because of those time values that don't match the feature_id dimension: > > ogrinfo > NETCDF:'/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp' > -ro > > Warning 1: The dataset has several variables that could be identified as > vector fields, but not all share the same primary dimension. Consequently > they will be ignored. > > I've seen similar cases in other files. I presume the driver could be > updated to 1) choose the primary dimension and read the values while ignore > others 2) user-specify the dimension to include, or 3) user-specify the > fields to exclude > > So: > > - is there a workaround to enable the vector driver to focus on the > primary dimension? > - would a PR along those lines have to consider greater difficulties than > applying the proposed updates to arrays using the primary dimension only? > I'd only consider this for strictly 1D arrays. > - degenerate dimensions could be used to copy-out the value of the other > dims (I'd consider this an optional extra) > > (It's a bit special-case-y, you wouldn't want to go to multi-arrays and > have them flatten out multi-dims in a general way, I think, but degenerate > dimensions might be worth consideration ) > > Appreciate any thoughts, thanks! I'd quite like to have the > vector-approach work as well as the mdim approach, I think they are nicely > complementary and provide different pros and cons. > > Cheers, Mike > > -- > Michael Sumner > Software and Database Engineer > Australian Antarctic Division > Hobart, Australia > e-mail: mdsum...@gmail.com > -- Michael Sumner Software and Database Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] HDF5 and identified fields / primary dimension
This source has an array on 'feature_id' with 2729077 values, with various fields elevation, longitude, latitude, qBtmVertRunoff, qBucket, etc '/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp' It is accessible via the mdim api. Structurally it is basically a table with rows per feature_id and columns per fields, but it has a length-1 pair of fields "time" and "reference_time" defined on dimension time, this is like a single time step per file (like an unlimited dimension in the classic 2D case). Accessing with the vector API reports that it can't treat this as a table because of those time values that don't match the feature_id dimension: ogrinfo NETCDF:'/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp' -ro Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored. I've seen similar cases in other files. I presume the driver could be updated to 1) choose the primary dimension and read the values while ignore others 2) user-specify the dimension to include, or 3) user-specify the fields to exclude So: - is there a workaround to enable the vector driver to focus on the primary dimension? - would a PR along those lines have to consider greater difficulties than applying the proposed updates to arrays using the primary dimension only? I'd only consider this for strictly 1D arrays. - degenerate dimensions could be used to copy-out the value of the other dims (I'd consider this an optional extra) (It's a bit special-case-y, you wouldn't want to go to multi-arrays and have them flatten out multi-dims in a general way, I think, but degenerate dimensions might be worth consideration ) Appreciate any thoughts, thanks! I'd quite like to have the vector-approach work as well as the mdim approach, I think they are nicely complementary and provide different pros and cons. Cheers, Mike -- Michael Sumner Software and Database Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev