This seems to have become a discussion specifically about the datum.

This issue was discussed at length at the GO-ESSP meeting in Asheville.
It seems reasonable to conclude on the basis of those discussions, and
the current ones about GCMs, that some datasets are simply too
imprecise for the datum to matter, while for others it might be
critical.

Perhaps the best approach still is for the datum to remain optional,
and it be left to the end user or application to decide what to do when
they are operating on two datasets where one has a datum specified
and the other doesn't; or when both do but they are different. A
quick implementation might be to emit a warning and keep going:-).

This may go some way to meeting David's requirement that end users "know
they are making assumptions", but I don't see an alternative to allowing
datasets without a datum to treat coordinates as having whatever
reference datum the end user would like it to have. It would make no
sense to require GCM output to mention a datum at all. We have been
known to run coupled models where different component models use
different values for the radius of a spherical Earth:-).

David Blodgett writes:

Jonathan and Karl,

I don't disagree with the GCM examples or lack of model resolution being talked 
about here, I just don't think that is the only argument.

There are cases, like downscaled climate projections, reanalysis products, or 
radar indicated rainfall, where a datum is critical to interpreting the data 
accurately. My concern is that the justified disregard for datums at course 
resolution (and lack of requirement for it in the spec) fosters a lack of 
awareness that datums become critically important at finer scales.

Jonathan, you have a point that an analyst should be free to make their own 
assumptions. However, I would prefer that they know they are making 
assumptions. It would be helpful if the CF specification paid some tribute to 
this issue rather than treating lat/lon coordinates as base-equivalent 
coordinates.

Are there any arguments against CF recommending a standard datum assumption 
when intersecting data without a datum specified with data that does have a 
datum specified?

Cheers,

Dave

On Jul 27, 2011, at 12:14 PM, Karl Taylor wrote:

Hi all,

(I think the horse still shows a few signs of life)

what I'm arguing is that if the results the scientists are using come from a 
GCM, then although the two scientists got differences, those differences (no 
matter how large) should not be considered significant in terms of their 
reliability (as opposed to in some statistical sense).  Just as one wouldn't 
rely on a climate model to predict global mean temperature to within 0.001 K 
(and surely it would be silly and perhaps misleading to report temperatures to 
this precision) one shouldn't expect to pin down the *location* of the grid 
cell temperature reported by a GCM to a point given with higher precision than 
the spacing of the grid cells (at least when comparing with observations).

I will grant you that at some time far in the future, it is possible our 
models' resolution and accuracy will have improved to the point that we might 
have to alter the precision with which we report the locations of their output 
values, but we're not there yet.

Best regards,
Karl



On 7/27/11 9:23 AM, David Blodgett wrote:

Not to beat a dead horse, but this issue has been a huge stumbling block in our 
work to integrate data and software across the climate and geographic 
communities.

The argument here is: Since CF data is usually so coarse and low precision 
complete geolocation metadata should not be required.

An example of why this matters: Two scientists take the same downscaled climate 
data that doesn't have a datum specification and import it into their 
application. One application assumes one datum, the other assumes another 
datum. Scientist 1's results differ from scientist 2's results. In situations 
where their are steep gradients in downscaled data, these differences may be 
substantial.

One solution would be to adopt a default datum for data lacking datum 
definition. So, given a file that uses lat/lon and claims to follow CF spec, a 
scientist could follow the specs guidance on what datum to assume. Without this 
type guidance or a requirement to include the information, lat/lon without a 
datum amounts to providing any other value without units.

Dave

On Jul 27, 2011, at 10:59 AM, Karl Taylor wrote:

Dear all,

another view:

Can't remember *all* the issues here, but certainly reporting the latitude and 
longitude points for GCM grids without further precision (e.g., information on 
the figure of the Earth) is sufficient for any comparison with observations.  
Only certain (usually prescribed) conditions at the earth's surface (e.g., 
surface height) coming from a GCM should be trusted at the individual grid 
point scale, and no sub-grid scale information is directly available from the 
GCM (normally).  So, even if a station data is near the boundary of a GCM's 
grid-cell, it should hardly matter which of the grid cells it straddles you 
compare it to.  The GCM sort of gives you a grid cell average value that 
applies to some region in the vicinity of the  cell.  So, it doesn't matter 
where you think it is precisely located.

Down-scaled output from the GCM will be at higher resolution, but again since 
the original data doesn't apply at a point but for a general region (usually 
quite a bit larger than 12 km, and even if it weren't we wouldn't believe stuff 
going on at that scale), so where the cell is exactly located again doesn't 
matter.

best regards,
Karl


On 7/27/11 4:38 AM, David Blodgett wrote:

Without the grid_mapping, the lat and lon still make sense in the common case
(and original CF case) of GCM data, and in many other cases, the intended
usage of the data does not require precision about the figure of the Earth.
Although this metadata could be valuable if it can be defined, I think it would
be too onerous to require it.

I hope to present on this very issue at AGU. The problem we see with ambiguous 
definition of datums is a cascade of non-recognition of datums through 
processing algorithms and in the output of some processes that generate very 
detailed data.

The prime example is downscaled climate data. Because the climate modelers 
involved generally consider lat/lon to be a lowest common denominator, the 
datum used to geolocate historical data (like rain gages) is neglected. What 
results is, in our case, a 1/8deg (12km) grid with no datum. This is 
unacceptable. As at this resolution, the errors in a wrong assumption of datum 
for the grid can cause very substantial (a full grid cell or more) geolocation 
errors.

If the CF community intends to consume any ground based data, then datums must 
be preserved from ingest of ground based forcing throughout data storage and 
processing. This is fundamental information that is required for ALL data 
comparison operations.

I would argue that CF compliance should require this information. This puts the 
requirement to make metadata assumptions on data publishers/producers rather 
than data consumers. It is unacceptable to have different data consumers making 
different assumptions of geolocation on the same data.

Off soapbox.

Dave Blodgett
Center for Integrated Data Analytics (CIDA)
USGS WI Water Science Center
8505 Research Way Middleton WI 53562
608-821-3899 | 608-628-5855 (cell)
http://cida.usgs.gov


On Jul 26, 2011, at 5:24 AM, Jonathan Gregory wrote:

Dear all

For datasets which are intended for analysis by end-users I think it would be
undesirable to remove the requirement of providing explicit lat and lon
coords even if a grid_mapping is provided. I think it is unrealistic to expect
all software which someone might use to analyse netCDF files to be able to
recognise and act upon all possible values of the CF grid_mapping attribute,
and without the lat and lon information the user would have a problem. If the
issue is storage space in the file I think the much better choice is to store
the explicit coordinates in another file, by extending the CF convention to
allow datasets to be distributed over several linked files, as gridspec does
for example.

Steve appears to suggest that grid_mapping is required in some circumstances,
but I don't think it is at present. However, the text Steve quotes may not be
quite right:

  "/When the coordinate variables for a horizontal grid are not
  longitude and latitude,*_it is required that the true latitude and
  longitude coordinates be supplied_* via the coordinates attribute/."

The text should make it clear that this requirement applies when the data has a
geolocated horizontal grid. It doesn't necessarily apply to idealised cases.
We could clarify this with a defect ticket.

Cheers

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




--

V. Balaji                               Office:  +1-609-452-6516
Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
Princeton University                    Email: v.bal...@noaa.gov
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to