Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file
Dear Phil, a programmer has usually to decide if he accepts 'minor' problems with the input or not. A program-user usually selects the program which accepts most input. So a CF-processing program not accepting a dataset without datum won't get very popular. Of course, a program allowing the user to modify/correct the datum is even more popular. The climate and forecast modellers used for a very long time a spherical earth with a radius around 3671km (= GRS1980 Authalic Sphere, not the GRS1980 ellipsoid). Therefore, all CF-processing programs I know assume that spherical earth radius of that size (i.e. netcdf-java, ncview, fimex). So, the proposed default is the de facto standard and I believe, it is better to write it down than to keep it informal. If datum information is merely an extension to CF, it will take very long until it is accepted. If CF always comes with a datum, data-providers will adapt more quickly, since a missing datum might be an obvious error (high priority) rather than a missing extension (low priority). Setting a default for something previously undefined is very common. A precedence which was the same amount of work (update of attributes) for all non-english data-providers was the definition of UTF-8 as character-encoding in netcdf-3.6.3 in ~2008. Best regards, Heiko On 2011-08-03 15:48, Bentley, Philip wrote: Dear Heiko, I hope CF could define a default datum, e.g. the GRS1980 Authalic Sphere, since this matches most closely with existing software (netcdf-java). This would make live easier for the software-developers who have to use something if nothing is given. I'm not sure that defining a default datum for CF is the right way to go in this instance. I would have thought that if a particular piece of data analysis is at a resolution that requires a geodetic datum to be specified then, in absentia the actual one being defined in metadata, it's not clear to me that using some semi-arbitrary, and potentially invalid, default datum is any better than giving the user the opportunity to select the one s/he believes to be the most appropriate for the task in hand. The current CF conventions include a (fairly minimal) set of metadata attributes which can be used to describe the basic properties of the coordinate reference system associated with a given dataset. The onus then is on data producers to utilise those metadata attributes to describe their data to the fullest extent possible. Furthermore, other non-CF attributes may be used to augment the standard set - over time some of these additional attributes would no doubt find their way into the CF specification. Ultimately, if end-users consider that a given dataset has insufficient metadata to justify its use within a particular context, then they can always choose to ignore that dataset. With the passage of time - and in true Darwinian fashion - such datasets (and their producers) will find that they are increasingly disregarded/overlooked in analyses. Hopefully this would galvanise such data producers into improving the quality of their spatial metadata! Regards, Phil PS: if a default datum were to be encoded into the CF conventions, I'd imagine that the WGS84 datum would be the way to go rather than GRS80 which, if I understand correctly, has somewhat more of a bias towards use over the North American continent. That said, I suspect the differences between the 2 datums are sufficiently small as to get lost in the underflow for many metocean research applications. ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file
Jon, just to make you think twice: How will ncWMS display data without datum? Using netcdf-java's default? I think 1) and 2) are both valid and I propose a something like: All geo-data must come with the best datum information. Missing ellipsoid-information will be interpreted as 3671km sphere. Missing Bursa Wolf parameters will be interpreted as all 0. This would discourage using the default, but at least it is written down. And the default ellipsoid is discouraging enough in itself for anybody caring about ellipsoids, but nicely backwards compatible. Heiko On 2011-08-03 16:30, Jon Blower wrote: John - 1) possible recommendations by CF to always include the datum, and making sure we have the right metadata to do so. 2) possible recommendations by CF as to what to do if the datum is not present, or is only partially specified. yes I think that's right. Personally I'm not in favour of a default datum - if the datum is unknown this should be expressed clearly. Jon ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file
Dear all I agree with Balaji about this: It is perfectly possible to configure a GCM out of components with inconsistent datums. ... I still feel my principal objection to specifying the datum is the implication of precision that does not exist. This kind of argument may apply more to models, but it appears from the comments that it might apply to obs datasets as well, for instance when they have been assembled from various sources, which could be poorly documented. Hence in terms of John's questions, I think this is good: 1) possible recommendations by CF to always include the datum, and making sure we have the right metadata to do so. We could make a recommendation in the standard document to include the grid_mapping, if there is an applicable one, whenever there are horizontal coordinates. The CF checker would then produce a warning if the grid_mapping was absent. But I don't think it is the role of CF to include: 2) possible recommendations by CF as to what to do if the datum is not present, or is only partially specified. CF is a convention for allowing metadata to be provided. Its purpose is not to prescribe or guess what the metadata should be. There is lots of important metadata which is optional, such as standard names and bounds, and we do not recommend what should be done when they are missing. Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file
Hi Jonathan, all, There is lots of important metadata which is optional, such as standard names I think the case of a missing datum is different from the case of a missing standard name. In order to plot data in a GIS (or similar system that allows different datasets to be compared or overlain spatially) you *always* need a datum. If one isn't supplied by the data provider, the user or client software has to invent or guess one. That's not true for a standard name (although arguably it is true for the coordinate bounds). Hence Heiko's comment about what ncWMS does - in fact ncWMS assumes WGS84 for lat-lon coordinates, but perhaps it should assume something different. It *does* make a difference in lots of cases - both model and obs data. It seems we're all agreeing that CF should at least have the right tags to ensure that a datum *can* be well specified, for 1D grid coordinates, projected grid coordinates, 2D curvilinear grid coordinates and geolocation of observations. Would you agree? What seems to be under debate is to what happens when the datum is genuinely (and regrettably) not known or incompletely specified (or perhaps where the dataset has been derived from mixed datums). The proposals seem to be: 1. Rule this out of scope for CF and don't provide any datum information at all. 2. Invent CF tags that explicitly say the datum is unknown, and perhaps provide a reason (to give the user a bit more information for his/her interpretation). 3. Specify a default datum (noting that client software or the user will commonly have to invent one anyway). Is this a fair summary of the situation? Best wishes, Jon -Original Message- From: cf-metadata-boun...@cgd.ucar.edu [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory Sent: 04 August 2011 08:23 To: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file Dear all I agree with Balaji about this: It is perfectly possible to configure a GCM out of components with inconsistent datums. ... I still feel my principal objection to specifying the datum is the implication of precision that does not exist. This kind of argument may apply more to models, but it appears from the comments that it might apply to obs datasets as well, for instance when they have been assembled from various sources, which could be poorly documented. Hence in terms of John's questions, I think this is good: 1) possible recommendations by CF to always include the datum, and making sure we have the right metadata to do so. We could make a recommendation in the standard document to include the grid_mapping, if there is an applicable one, whenever there are horizontal coordinates. The CF checker would then produce a warning if the grid_mapping was absent. But I don't think it is the role of CF to include: 2) possible recommendations by CF as to what to do if the datum is not present, or is only partially specified. CF is a convention for allowing metadata to be provided. Its purpose is not to prescribe or guess what the metadata should be. There is lots of important metadata which is optional, such as standard names and bounds, and we do not recommend what should be done when they are missing. Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file
I think this is good summary. More comments below: On 8/4/2011 5:15 AM, Jon Blower wrote: Hi Jonathan, all, There is lots of important metadata which is optional, such as standard names I think the case of a missing datum is different from the case of a missing standard name. In order to plot data in a GIS (or similar system that allows different datasets to be compared or overlain spatially) you *always* need a datum. If one isn't supplied by the data provider, the user or client software has to invent or guess one. That's not true for a standard name (although arguably it is true for the coordinate bounds). Not just GIS, any operation which involves data integration needs to bring all the data sets in question to a common reference frame and when it comes to spatial reference frame, datum is absolutely crucial. Data integration is the keyword here. Without datum, the dataset is less significant for data integration. If the data provider expects the data to be used in any such operation, datum should be provided. Of course, there are situations when datum is genuinely not known even to the data provider. In such situations, I think it should be data provider who should be guessing the datum after due diligence, not the user. Upendra Hence Heiko's comment about what ncWMS does - in fact ncWMS assumes WGS84 for lat-lon coordinates, but perhaps it should assume something different. It *does* make a difference in lots of cases - both model and obs data. It seems we're all agreeing that CF should at least have the right tags to ensure that a datum *can* be well specified, for 1D grid coordinates, projected grid coordinates, 2D curvilinear grid coordinates and geolocation of observations. Would you agree? What seems to be under debate is to what happens when the datum is genuinely (and regrettably) not known or incompletely specified (or perhaps where the dataset has been derived from mixed datums). The proposals seem to be: 1. Rule this out of scope for CF and don't provide any datum information at all. 2. Invent CF tags that explicitly say the datum is unknown, and perhaps provide a reason (to give the user a bit more information for his/her interpretation). 3. Specify a default datum (noting that client software or the user will commonly have to invent one anyway). Is this a fair summary of the situation? Best wishes, Jon -Original Message- From: cf-metadata-boun...@cgd.ucar.edu [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory Sent: 04 August 2011 08:23 To: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file Dear all I agree with Balaji about this: It is perfectly possible to configure a GCM out of components with inconsistent datums. ... I still feel my principal objection to specifying the datum is the implication of precision that does not exist. This kind of argument may apply more to models, but it appears from the comments that it might apply to obs datasets as well, for instance when they have been assembled from various sources, which could be poorly documented. Hence in terms of John's questions, I think this is good: 1) possible recommendations by CF to always include the datum, and making sure we have the right metadata to do so. We could make a recommendation in the standard document to include the grid_mapping, if there is an applicable one, whenever there are horizontal coordinates. The CF checker would then produce a warning if the grid_mapping was absent. But I don't think it is the role of CF to include: 2) possible recommendations by CF as to what to do if the datum is not present, or is only partially specified. CF is a convention for allowing metadata to be provided. Its purpose is not to prescribe or guess what the metadata should be. There is lots of important metadata which is optional, such as standard names and bounds, and we do not recommend what should be done when they are missing. Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] per-variable metadata?
Hi Jeff, Each variable in a CF file may possess an |ancillary_variables| attribute, that points to variables that have relationships (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data). To attach flags to a variable, use |ancillary_variables| to point to a variable that has |flag_values||| and |flag_meanings |attributes (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags). We have started a discussion in the background, whether an example that illustrates this should be included in the CF documentation. - Steve = On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote: In another discussion, Steve Hankin wrote: CF generally favors attributes attached to variables over attributes attached to files This reminds me of a question I wanted to ask: does CF have any conventions regarding how to handle data that contains multiple observed quantities with different quality flags, comment fields or other attributes for each quantity? -Jeff DLB ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] per-variable metadata?
Hi Steve - I'm very interested in the background discussion on this - any chance of bringing it into the foreground? I'm using ancillary variables in 2-D in situ data files to describe instruments and things like precision, accuracy, sample scheme, etc.. For temperature files from moorings where different sensor types are at different depths, I'd like to use something like TEMP:ancillary_variables = "Instrument_manufacturer Instrument_model Instrument_sample_scheme Instrument_serial_number TEMP_qc_procedure TEMP_accuracy TEMP_precision TEMP_resolution"; and then short INST_SN(depth) ; INST_SN:long_name = "instrument_serial_number" ; ... etc., etc. If there's going to be a standard way to do this, I'd really like to know about it - sooner rather than than later. Thanks - Nan On 8/4/11 11:35 AM, Steve Hankin wrote: Hi Jeff, Each variable in a CF file may possess an |ancillary_variables| attribute, that points to variables that have relationships (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data). To attach flags to a variable, use |ancillary_variables| to point to a variable that has |flag_values||| and |flag_meanings |attributes (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags). We have started a discussion in the background, whether an example that illustrates this should be included in the CF documentation. - Steve = On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote: In another discussion, Steve Hankin wrote: CF generally favors attributes attached to variables over attributes attached to files This reminds me of a question I wanted to ask: does CF have any conventions regarding how to handle data that contains multiple observed quantities with different quality flags, comment fields or other attributes for each quantity? -Jeff DLB ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata -- *** * Nan Galbraith(508) 289-2444 * * Upper Ocean Processes GroupMail Stop 29 * * Woods Hole Oceanographic Institution* * Woods Hole, MA 02543* *** ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] per-variable metadata?
Hi Nan, Encoding what I would regard as usage metadata into ancilliary variables is one solution, but I would much rather see a formalised metadata model for doing this. As you say a standard way of doing this is long overdue. My hope is that SeadataNet II starting later this year will take on and address this challenge. However, if anybody knows of anything already in existence that fits the bill it would be good to prevent yet another wheel being reinvented. Cheers, Roy. From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Nan Galbraith [ngalbra...@whoi.edu] Sent: 04 August 2011 19:17 To: Steve Hankin; cf-metadata@cgd.ucar.edu Cc: Jeff deLaBeaujardiere Subject: Re: [CF-metadata] per-variable metadata? Hi Steve - I'm very interested in the background discussion on this - any chance of bringing it into the foreground? I'm using ancillary variables in 2-D in situ data files to describe instruments and things like precision, accuracy, sample scheme, etc.. For temperature files from moorings where different sensor types are at different depths, I'd like to use something like TEMP:ancillary_variables = Instrument_manufacturer Instrument_model Instrument_sample_scheme Instrument_serial_number TEMP_qc_procedure TEMP_accuracy TEMP_precision TEMP_resolution; and then short INST_SN(depth) ; INST_SN:long_name = instrument_serial_number ; ... etc., etc. If there's going to be a standard way to do this, I'd really like to know about it - sooner rather than than later. Thanks - Nan On 8/4/11 11:35 AM, Steve Hankin wrote: Hi Jeff, Each variable in a CF file may possess an |ancillary_variables| attribute, that points to variables that have relationships (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data). To attach flags to a variable, use |ancillary_variables| to point to a variable that has |flag_values||| and |flag_meanings |attributes (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags). We have started a discussion in the background, whether an example that illustrates this should be included in the CF documentation. - Steve = On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote: In another discussion, Steve Hankin wrote: CF generally favors attributes attached to variables over attributes attached to files This reminds me of a question I wanted to ask: does CF have any conventions regarding how to handle data that contains multiple observed quantities with different quality flags, comment fields or other attributes for each quantity? -Jeff DLB ___ CF-metadata mailing list CF-metadata@cgd.ucar.edumailto:CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edumailto:CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata -- *** * Nan Galbraith(508) 289-2444 * * Upper Ocean Processes GroupMail Stop 29 * * Woods Hole Oceanographic Institution* * Woods Hole, MA 02543* *** -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] per-variable metadata?
On 8/4/2011 1:30 PM, Upendra Dadi wrote: Hi Steve Nan, Is this allowed in CF? Isn't ancillary_variable meant to be used for per value metadata i.e. metadata for each and every value in the variable it is referring to? If so, shouldn't the ancillary_variable have the same set of dimensions and in the same order as the variable it is referring to? Hi Upendra, Nan, Roy, First off just to comment that these topics seems appropriate as a logical next step. The new Discrete Geometries chapter largely focused on the file structure aspects -- storing the numbers. Community by community the metadata needs vary, of course. Defining standard profiles (e.g. OceanSites) is the natural approach to standardizing metadata contents. So I think we are all asking the question, what are the general rules and structures that should be followed so that generic applications are best able to access and utilize the specialized metadata encoded per a standardized CF profile? A second point is to confess that I wasn't personally involved in the discussions that lead to the ancillary_variables machinery. I'm ready to stand corrected if I misinterpret the words found in CF 1.5. Lets just get the ideas out on the table and let others comment. For the case that Nan has described, if one were using the techniques of chapter 9 (https://cf-pcmdi.llnl.gov/trac/attachment/ticket/37/CFch9-may4.docx?format=raw) the metadata would be tied to the variable TEMP by its station index, rather than by an ancillary_variable attribute. In this example the station_info variable is a model model for Instrument_manufacturer, Instrument_model, etc. A9.2.4 Contiguous ragged array representation of timeSeries dimensions: station = 23 ; obs = 1234 ; variables: float lon(station) ; lon:standard_name = longitude; lon:long_name = station longitude; lon:units = degrees_east; float lat(station) ; lat:standard_name = latitude; lat:long_name = station latitude ; lat:units = degrees_north ; char station_name(station, name_strlen) ; station_name:long_name = station name ; station_name:cf_role = station_idtimeseries_id; * int station_info(station) ;* * station_info:long_name = some kind of station info ;* int row_size(station) ; row_size:long_name = number of observations for this station ; row_size:ample_dimension = obs ; double time(obs) ; time:standard_name = time; time:long_name = time of measurement ; time:units = days since 1970-01-01 00:00:00 ; float humidity(obs) ; humidity:standard_name = specific_humidity ; Nan, I think in your example the depth dimension is effectively the same as the station dimension in A9.2.4 (or 9.2.1) -- independent instruments deployed at a list of depths (stations) with metadata describing each depth. So the question is whether the the association of metadata through the station (or depth) dimension is sufficient? (I think it is.) Or is there a use case that demonstrates that the ancillary_variable machinery is needed, as well? Upendra, your point, /ancillary_variable meant to be used for per value metadata i.e. metadata for each and every value in the variable it is referring to/ is a strict interpretation of the opening sentence of 3.4. Ancillary Data (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data), /one data variable provides metadata about the individual values of another data variable/. That interpretation would rule out cases like the following, which seem desirable to encode (imagining an instrument such as a Doppler profiler, where the uncertainty in a velocity measurement is a function of depth) float q(time, depth) ; q:standard_name = ||upward_sea_water_velocity ; q:ancillary_variables = q_uncertainty ; float q_uncertainty(depth) Some word-smithing seems to be in order to clarify that opening sentence of section 3.4. - Steve Upendra On 8/4/2011 2:17 PM, Nan Galbraith wrote: Hi Steve - I'm very interested in the background discussion on this - any chance of bringing it into the foreground? I'm using ancillary variables in 2-D in situ data files to describe instruments and things like precision, accuracy, sample scheme, etc.. For temperature files from moorings where different sensor types are at different depths, I'd like to use something like TEMP:ancillary_variables = Instrument_manufacturer Instrument_model Instrument_sample_scheme Instrument_serial_number TEMP_qc_procedure TEMP_accuracy TEMP_precision TEMP_resolution; and then short INST_SN(depth) ; INST_SN:long_name =