Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file

2011-08-04 Thread Heiko Klein

Dear Phil,

a programmer has usually to decide if he accepts 'minor' problems with 
the input or not. A program-user usually selects the program which 
accepts most input. So a CF-processing program not accepting a dataset 
without datum won't get very popular. Of course, a program allowing the 
user to modify/correct the datum is even more popular.


The climate and forecast modellers used for a very long time a spherical 
earth with a radius around 3671km (= GRS1980 Authalic Sphere, not the 
GRS1980 ellipsoid). Therefore, all CF-processing programs I know assume 
that spherical earth radius of that size (i.e. netcdf-java, ncview, 
fimex). So, the proposed default is the de facto standard and I believe, 
it is better to write it down than to keep it informal.


If datum information is merely an extension to CF, it will take very 
long until it is accepted. If CF always comes with a datum, 
data-providers will adapt more quickly, since a missing datum might be 
an obvious error (high priority) rather than a missing extension (low 
priority).



Setting a default for something previously undefined is very common. A 
precedence which was the same amount of work (update of attributes) for 
all non-english data-providers was the definition of UTF-8 as 
character-encoding in netcdf-3.6.3 in ~2008.


Best regards,

Heiko

On 2011-08-03 15:48, Bentley, Philip wrote:

Dear Heiko,


I hope CF could define a default datum, e.g. the GRS1980
Authalic Sphere, since this matches most closely with
existing software (netcdf-java). This would make live easier
for the software-developers who have to use something if
nothing is given.


I'm not sure that defining a default datum for CF is the right way to go
in this instance. I would have thought that if a particular piece of
data analysis is at a resolution that requires a geodetic datum to be
specified then, in absentia the actual one being defined in metadata,
it's not clear to me that using some semi-arbitrary, and potentially
invalid, default datum is any better than giving the user the
opportunity to select the one s/he believes to be the most appropriate
for the task in hand.

The current CF conventions include a (fairly minimal) set of metadata
attributes which can be used to describe the basic properties of the
coordinate reference system associated with a given dataset. The onus
then is on data producers to utilise those metadata attributes to
describe their data to the fullest extent possible. Furthermore, other
non-CF attributes may be used to augment the standard set - over time
some of these additional attributes would no doubt find their way into
the CF specification.

Ultimately, if end-users consider that a given dataset has insufficient
metadata to justify its use within a particular context, then they can
always choose to ignore that dataset. With the passage of time - and in
true Darwinian fashion - such datasets (and their producers) will find
that they are increasingly disregarded/overlooked in analyses. Hopefully
this would galvanise such data producers into improving the quality of
their spatial metadata!

Regards,
Phil


PS: if a default datum were to be encoded into the CF conventions, I'd
imagine that the WGS84 datum would be the way to go rather than GRS80
which, if I understand correctly, has somewhat more of a bias towards
use over the North American continent. That said, I suspect the
differences between the 2 datums are sufficiently small as to get lost
in the underflow for many metocean research applications.



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file

2011-08-04 Thread Heiko Klein

Jon,

just to make you think twice:

How will ncWMS display data without datum? Using netcdf-java's default?


I think 1) and 2) are both valid and I propose a something like:


All geo-data must come with the best datum information. Missing 
ellipsoid-information will be interpreted as 3671km sphere. Missing 
Bursa Wolf parameters will be interpreted as all 0.



This would discourage using the default, but at least it is written 
down. And the default ellipsoid is discouraging enough in itself for 
anybody caring about ellipsoids, but nicely backwards compatible.


Heiko


On 2011-08-03 16:30, Jon Blower wrote:

John -


1) possible recommendations by CF to always include the datum, and making sure 
we have the right metadata to do so.



2) possible recommendations by CF as to what to do if the datum is not present, 
or is only partially specified.


yes I think that's right.  Personally I'm not in favour of a default datum - if 
the datum is unknown this should be expressed clearly.

Jon




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file

2011-08-04 Thread Jonathan Gregory
Dear all

I agree with Balaji about this:
 It is perfectly possible to configure a GCM out of components with
 inconsistent datums.
...
 I still feel my principal objection to specifying the datum is the
 implication of precision that does not exist.
This kind of argument may apply more to models, but it appears from the
comments that it might apply to obs datasets as well, for instance when they
have been assembled from various sources, which could be poorly documented.

Hence in terms of John's questions, I think this is good:

 1) possible recommendations by CF to always include the datum, and
 making sure we have the right metadata to do so.

We could make a recommendation in the standard document to include the
grid_mapping, if there is an applicable one, whenever there are horizontal
coordinates. The CF checker would then produce a warning if the grid_mapping
was absent. But I don't think it is the role of CF to include:

 2) possible recommendations by CF as to what to do if the datum is not
 present, or is only partially specified.

CF is a convention for allowing metadata to be provided. Its purpose is not
to prescribe or guess what the metadata should be. There is lots of important
metadata which is optional, such as standard names and bounds, and we do not
recommend what should be done when they are missing.

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file

2011-08-04 Thread Jon Blower
Hi Jonathan, all,

 There is lots of important metadata which is optional, such as standard names

I think the case of a missing datum is different from the case of a missing 
standard name.  In order to plot data in a GIS (or similar system that allows 
different datasets to be compared or overlain spatially) you *always* need a 
datum.  If one isn't supplied by the data provider, the user or client software 
has to invent or guess one.  That's not true for a standard name (although 
arguably it is true for the coordinate bounds).

Hence Heiko's comment about what ncWMS does - in fact ncWMS assumes WGS84 for 
lat-lon coordinates, but perhaps it should assume something different.  It 
*does* make a difference in lots of cases - both model and obs data.


It seems we're all agreeing that CF should at least have the right tags to 
ensure that a datum *can* be well specified, for 1D grid coordinates, projected 
grid coordinates, 2D curvilinear grid coordinates and geolocation of 
observations.  Would you agree?

What seems to be under debate is to what happens when the datum is genuinely 
(and regrettably) not known or incompletely specified (or perhaps where the 
dataset has been derived from mixed datums).  The proposals seem to be:

1. Rule this out of scope for CF and don't provide any datum information at all.
2. Invent CF tags that explicitly say the datum is unknown, and perhaps provide 
a reason (to give the user a bit more information for his/her interpretation).
3. Specify a default datum (noting that client software or the user will 
commonly have to invent one anyway).

Is this a fair summary of the situation?


Best wishes,
Jon


-Original Message-
From: cf-metadata-boun...@cgd.ucar.edu 
[mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory
Sent: 04 August 2011 08:23
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] the need to store lat/lon coordinates in a 
CF-compliant netCDF file

Dear all

I agree with Balaji about this:
 It is perfectly possible to configure a GCM out of components with 
 inconsistent datums.
...
 I still feel my principal objection to specifying the datum is the 
 implication of precision that does not exist.
This kind of argument may apply more to models, but it appears from the 
comments that it might apply to obs datasets as well, for instance when they 
have been assembled from various sources, which could be poorly documented.

Hence in terms of John's questions, I think this is good:

 1) possible recommendations by CF to always include the datum, and 
 making sure we have the right metadata to do so.

We could make a recommendation in the standard document to include the 
grid_mapping, if there is an applicable one, whenever there are horizontal 
coordinates. The CF checker would then produce a warning if the grid_mapping 
was absent. But I don't think it is the role of CF to include:

 2) possible recommendations by CF as to what to do if the datum is not 
 present, or is only partially specified.

CF is a convention for allowing metadata to be provided. Its purpose is not to 
prescribe or guess what the metadata should be. There is lots of important 
metadata which is optional, such as standard names and bounds, and we do not 
recommend what should be done when they are missing.

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] the need to store lat/lon coordinates in a CF-compliant netCDF file

2011-08-04 Thread Upendra Dadi

I think this is good summary. More comments below:

On 8/4/2011 5:15 AM, Jon Blower wrote:

Hi Jonathan, all,


There is lots of important metadata which is optional, such as standard names

I think the case of a missing datum is different from the case of a missing 
standard name.  In order to plot data in a GIS (or similar system that allows 
different datasets to be compared or overlain spatially) you *always* need a 
datum.  If one isn't supplied by the data provider, the user or client software 
has to invent or guess one.  That's not true for a standard name (although 
arguably it is true for the coordinate bounds).
   Not just GIS, any operation which involves data integration needs to 
bring all the data sets in question to a common reference frame and when 
it comes to spatial reference frame, datum is absolutely crucial. Data 
integration is the keyword here. Without datum, the dataset is less 
significant for data integration. If the data provider expects the data 
to be used in any such operation, datum should be provided. Of course, 
there are situations when datum is genuinely not known even to the data 
provider. In such situations, I think it should be data provider who 
should be guessing the datum after due diligence, not the user.


Upendra


Hence Heiko's comment about what ncWMS does - in fact ncWMS assumes WGS84 for 
lat-lon coordinates, but perhaps it should assume something different.  It 
*does* make a difference in lots of cases - both model and obs data.


It seems we're all agreeing that CF should at least have the right tags to 
ensure that a datum *can* be well specified, for 1D grid coordinates, projected 
grid coordinates, 2D curvilinear grid coordinates and geolocation of 
observations.  Would you agree?

What seems to be under debate is to what happens when the datum is genuinely 
(and regrettably) not known or incompletely specified (or perhaps where the 
dataset has been derived from mixed datums).  The proposals seem to be:

1. Rule this out of scope for CF and don't provide any datum information at all.
2. Invent CF tags that explicitly say the datum is unknown, and perhaps provide 
a reason (to give the user a bit more information for his/her interpretation).
3. Specify a default datum (noting that client software or the user will 
commonly have to invent one anyway).

Is this a fair summary of the situation?


Best wishes,
Jon


-Original Message-
From: cf-metadata-boun...@cgd.ucar.edu 
[mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory
Sent: 04 August 2011 08:23
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] the need to store lat/lon coordinates in a 
CF-compliant netCDF file

Dear all

I agree with Balaji about this:

It is perfectly possible to configure a GCM out of components with
inconsistent datums.

...

I still feel my principal objection to specifying the datum is the
implication of precision that does not exist.

This kind of argument may apply more to models, but it appears from the 
comments that it might apply to obs datasets as well, for instance when they 
have been assembled from various sources, which could be poorly documented.

Hence in terms of John's questions, I think this is good:


1) possible recommendations by CF to always include the datum, and
making sure we have the right metadata to do so.

We could make a recommendation in the standard document to include the 
grid_mapping, if there is an applicable one, whenever there are horizontal 
coordinates. The CF checker would then produce a warning if the grid_mapping 
was absent. But I don't think it is the role of CF to include:


2) possible recommendations by CF as to what to do if the datum is not
present, or is only partially specified.

CF is a convention for allowing metadata to be provided. Its purpose is not to 
prescribe or guess what the metadata should be. There is lots of important 
metadata which is optional, such as standard names and bounds, and we do not 
recommend what should be done when they are missing.

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] per-variable metadata?

2011-08-04 Thread Steve Hankin

Hi Jeff,

Each variable in a CF file may possess an |ancillary_variables| 
attribute, that points to variables that have relationships 
(http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data).  
To attach flags to a variable, use |ancillary_variables| to point to a 
variable that has |flag_values||| and |flag_meanings |attributes 
(http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags).


We have started a discussion in the background, whether an example that 
illustrates this should be included in the CF documentation.


- Steve

=

On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote:

In another discussion, Steve Hankin wrote:
 CF generally favors attributes attached to variables over attributes 
attached to files


This reminds me of a question I wanted to ask: does CF have any 
conventions regarding how to handle data that contains multiple 
observed quantities with different quality flags, comment fields or 
other attributes for each quantity?


-Jeff DLB



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] per-variable metadata?

2011-08-04 Thread Nan Galbraith

  
  
Hi Steve - 

I'm very interested in the background discussion on this - 
any chance of bringing it into the foreground? 

I'm using ancillary variables in 2-D in situ data files to describe

instruments and things like precision, accuracy, sample scheme, 
etc.. For temperature files from moorings where different sensor
types are at different depths, I'd like to use something like

TEMP:ancillary_variables = "Instrument_manufacturer Instrument_model
 Instrument_sample_scheme Instrument_serial_number
TEMP_qc_procedure 
 TEMP_accuracy TEMP_precision TEMP_resolution"; 
and then 
short INST_SN(depth) ;
  INST_SN:long_name = "instrument_serial_number" ;
... etc., etc.

If there's going to be a standard way to do this, I'd really like to

know about it - sooner rather than than later.

Thanks - 
Nan

On 8/4/11 11:35 AM, Steve Hankin wrote:
Hi
  Jeff, 
  
  Each variable in a CF file may possess an |ancillary_variables|
  attribute, that points to variables that have relationships (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data).

  To attach flags to a variable, use |ancillary_variables| to point
  to a variable that has |flag_values||| and |flag_meanings
  |attributes
  (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags).
  
  We have started a discussion in the background, whether an example
  that illustrates this should be included in the CF documentation.
  
  
   - Steve 
  
  = 
  
  On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote: 
  In another discussion, Steve Hankin wrote:

 CF generally favors attributes attached to variables over
attributes attached to files 

This reminds me of a question I wanted to ask: does CF have any
conventions regarding how to handle data that contains multiple
observed quantities with different quality flags, comment fields
or other attributes for each quantity? 

-Jeff DLB 



___ 
CF-metadata mailing list 
CF-metadata@cgd.ucar.edu

http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

  
  
  
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




-- 
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***



  


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] per-variable metadata?

2011-08-04 Thread Lowry, Roy K.
Hi Nan,

Encoding what I would regard as usage metadata into ancilliary variables is one 
solution, but I would much rather see a formalised metadata model for doing 
this.  As you say a standard way of doing this is long overdue. My hope is that 
SeadataNet II starting later this year will take on and address this challenge. 
 However, if anybody knows of anything already in existence that fits the bill 
it would be good to prevent yet another wheel being reinvented.

Cheers, Roy.

From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Nan Galbraith [ngalbra...@whoi.edu]
Sent: 04 August 2011 19:17
To: Steve Hankin; cf-metadata@cgd.ucar.edu
Cc: Jeff deLaBeaujardiere
Subject: Re: [CF-metadata] per-variable metadata?

Hi Steve -

I'm very interested in the background discussion on this -
any chance of bringing it into the foreground?

I'm using ancillary variables in 2-D in situ data files to describe
instruments and things like precision, accuracy, sample scheme,
etc.. For temperature files from moorings where different sensor
types are at different depths, I'd like to use something like

TEMP:ancillary_variables = Instrument_manufacturer Instrument_model
 Instrument_sample_scheme Instrument_serial_number TEMP_qc_procedure
 TEMP_accuracy TEMP_precision TEMP_resolution;
and then
short INST_SN(depth) ;
INST_SN:long_name = instrument_serial_number ;
... etc., etc.

If there's going to be a standard way to do this, I'd really like to
know about it - sooner rather than than later.

Thanks -
Nan

On 8/4/11 11:35 AM, Steve Hankin wrote:
Hi Jeff,

Each variable in a CF file may possess an |ancillary_variables| attribute, that 
points to variables that have relationships 
(http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data).
  To attach flags to a variable, use |ancillary_variables| to point to a 
variable that has |flag_values||| and |flag_meanings |attributes 
(http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#flags).

We have started a discussion in the background, whether an example that 
illustrates this should be included in the CF documentation.

- Steve

=

On 7/14/2010 6:21 AM, Jeff deLaBeaujardiere wrote:
In another discussion, Steve Hankin wrote:
 CF generally favors attributes attached to variables over attributes attached 
 to files

This reminds me of a question I wanted to ask: does CF have any conventions 
regarding how to handle data that contains multiple observed quantities with 
different quality flags, comment fields or other attributes for each quantity?

-Jeff DLB



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edumailto:CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edumailto:CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***



-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] per-variable metadata?

2011-08-04 Thread Steve Hankin



On 8/4/2011 1:30 PM, Upendra Dadi wrote:

Hi Steve  Nan,
  Is this allowed in CF? Isn't ancillary_variable meant to be used for 
per value metadata i.e. metadata for each and every value in the 
variable it is referring to? If so, shouldn't the ancillary_variable 
have the same set of dimensions and in the same order as the variable 
it is referring to?


Hi Upendra, Nan, Roy,

First off just to comment that these topics seems appropriate as a 
logical next step. The new Discrete Geometries chapter largely focused 
on the file structure aspects -- storing the numbers.   Community by 
community the metadata needs vary, of course.  Defining standard 
profiles (e.g. OceanSites) is the natural approach to standardizing 
metadata contents.  So I think we are all asking the question, what are 
the general rules and structures that should be followed so that generic 
applications are best able to access and utilize the specialized 
metadata encoded per a standardized CF profile?  A second point is to 
confess that I wasn't personally involved in the discussions that lead 
to the ancillary_variables machinery.  I'm ready to stand corrected if I 
misinterpret the words found in CF 1.5.  Lets just get the ideas out on 
the table and let others comment.


For the case that Nan has described, if one were using the techniques of 
chapter 9 
(https://cf-pcmdi.llnl.gov/trac/attachment/ticket/37/CFch9-may4.docx?format=raw) 
the metadata would be tied to the variable TEMP by its station index, 
rather than by an ancillary_variable attribute.  In this example the 
station_info variable is a model model for Instrument_manufacturer, 
Instrument_model, etc.


   A9.2.4 Contiguous ragged array representation of timeSeries
   dimensions:
   station = 23 ;
   obs = 1234 ;

   variables:
   float lon(station) ;
   lon:standard_name = longitude;
   lon:long_name = station longitude;
   lon:units = degrees_east;
   float lat(station) ;
   lat:standard_name = latitude;
   lat:long_name = station latitude ;
   lat:units = degrees_north ;
   char station_name(station, name_strlen) ;
   station_name:long_name = station name ;
   station_name:cf_role = station_idtimeseries_id;
   *   int station_info(station) ;*
   *   station_info:long_name = some kind of station info ;*
   int row_size(station) ;
   row_size:long_name = number of observations for this
   station  ;
   row_size:ample_dimension = obs ;

   double time(obs) ;
   time:standard_name = time;
   time:long_name = time of measurement ;
   time:units = days since 1970-01-01 00:00:00 ;
   float humidity(obs) ;
   humidity:standard_name = specific_humidity ;

Nan, I think in your example the depth dimension is effectively the 
same as the station dimension in A9.2.4 (or 9.2.1) -- independent 
instruments deployed at a list of depths (stations) with metadata 
describing each depth.   So the question is whether the the association 
of metadata through the station (or depth) dimension is sufficient?  (I 
think it is.)  Or is there a use case that demonstrates that the 
ancillary_variable machinery is needed, as well?


Upendra, your point,  /ancillary_variable meant to be used for per 
value metadata i.e. metadata for each and every value in the variable it 
is referring to/ is a strict interpretation of the opening sentence of 
3.4. Ancillary Data 
(http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#ancillary-data), 
/one data variable provides metadata about the individual values of 
another data variable/.  That interpretation would rule out cases like 
the following, which seem desirable to encode (imagining an instrument 
such as a Doppler profiler, where the uncertainty in a velocity 
measurement is a function of depth)


  float q(time, depth) ;
q:standard_name = ||upward_sea_water_velocity ;
q:ancillary_variables = q_uncertainty ;
  float q_uncertainty(depth)

Some word-smithing seems to be in order to clarify that opening sentence 
of section 3.4.


   - Steve 



Upendra


On 8/4/2011 2:17 PM, Nan Galbraith wrote:

Hi Steve -

I'm very interested in the background discussion on this -
any chance of bringing it into the foreground?

I'm using ancillary variables in 2-D in situ data files to describe
instruments and things like precision, accuracy, sample scheme,
etc.. For temperature files from moorings where different sensor
types are at different depths, I'd like to use something like

TEMP:ancillary_variables = Instrument_manufacturer Instrument_model
 Instrument_sample_scheme Instrument_serial_number 
TEMP_qc_procedure

 TEMP_accuracy TEMP_precision TEMP_resolution;
and then
short INST_SN(depth) ;
INST_SN:long_name =