Re: [CF-metadata] non-standard standard_names

2010-05-13 Thread Lowry, Roy K
Dear All,

The 'fast track' approach being discussed has promise and is pretty much in 
line with the ISO vocabulary model (in which terms have proposed, accepted, 
deprecated or deleted) used in resources like the GEMET thesaurus. However, 
there are important details to consider, such as version management (what event 
triggers the publication of a new version of the vocabulary?).

I am more uncomfortable with concept of community namespace Standard Name lists 
- I see this as the route to data ghettos (and don't truly believe that the 
Semantic Web would prevent this as nobody will bother doing the mappings)- and 
specialized standard names (in my view its either a Standard Name or it isn't 
and we have to accept that the nature of Standard Name is moving away from the 
purity of a geophysical phenomenon).

Cheers, Roy 

From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Nan Galbraith [ngalbra...@whoi.edu]
Sent: 12 May 2010 20:35
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] non-standard standard_names

The original proposal was to include names that have been rejected by
CF for being too specialized - these would be permanent parts of the
project vocabulary, not deprecated.

Many in situ instruments produce non-geophysical variables that fall
into this category; although this isn't what Martin had in mind,  his
proposal - or something along the same lines - would help us get to
a standard naming scheme for this kind of data too.

- Nan

 So my proposal was to create a vocabulary, or more precisely an RDF
 store, that lets us:
  1) declare a name that may be proposed as a CF candidate
  2) make a statement that the name has been (or even 'is being')
 submitted to CF for consideration
  3a) make a statement that the name has been accepted as a CF name,
 and therefore is deprecated as a proposed name
  3b) make a statement that the name has been rejected as a CF name,
 and therefore is deprecated as a proposed name
 In either 3a or 3b,
  4) make a statement that the replacement representation of the name
 is xyz in some other vocabulary




--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] non-standard standard_names

2010-05-12 Thread Steve Hankin

Hi Martin,

You've had two enthusiastic yes responses, so I guess I have the 
privilege to be the wet blanket.  So it goes.  I will give only a very 
cautious and limited yes.  Not an outright  no ... but a suggestion 
for more thought and discussion.


The proposal here is effectively the creation of 'private tables' as a 
means of achieving extensibility.  We've had an opportunity to see the 
hazards embedded in this approach as a long-term evolutionary process in 
WMO. Over time the custom tables evolve to have an quasi-official 
status -- entire sub-communities rely upon them -- but without 
necessarily a corresponding methodical control over their creation and 
distribution. With BUFR and GRIB files the proliferation of distinct 
tables has lead to serious interoperability problems.


To avoid repeating these problems with your proposal, CF clients must be 
provided with *iron-clad ways to be assured that they are referring to 
the same vocabulary tables that the data author was referring to at the 
time that the data were written*.  Since we want CF files to ensure 
interoperability when there are *years separating the writing of data 
from reading it*, your strategy needs to ensure careful version control 
over the private tables.  This imposes a significant burden on you as 
the creator of a project_standard_name table -- essentially a 
requirement to retain and serve out older table versions in perpetuity 
(we could argue over what that means).  The use of semantic web 
technologies will not alter these considerations for the foreseeable 
future (tho over the long term sophisticated inference engines might 
...).  The ontologies still need to be informed by correct information, 
which implies knowledge of the version-controlled private vocabularies.


A project_standard_name may have one of three life histories:  it 
may never become accepted into the standard_name table; it may be 
accepted as-is; or it may be accepted with alterations.  The following 
suggested restriction illustrates some of the difficulties: A variable 
can contain either a standard_name or project_standard_name attribute 
but not both.  What's behind this restriction?   Given the uncertain 
life history of a project_standard_name, if it  has been in use for 
(say) a year and is found in thousands of files that are being shared 
around the community, doesn't that generate a need to continue support 
for it.


Two alternative approaches (both flawed, of course ... the nature of the 
beast):


  1. Should the CF standard_name process, itself, include a
 provisional fast-track, that allows names to be added very
 quickly with no guarantee that they will have a lasting status,
 but with an *iron-clad guarantee that the provisional names will
 be retained* (and so-identified) in version-stamped (older) CF
 vocabularies.
 or
  2. Might you be better off using a *truly private* vocabulary of 
 project_standard_name strings.  I.e. one that has no official

 status in CF at all?  There is no violation to the CF standard
 through doing this.  This approach makes it your private
 responsibility on behalf of your users to deal with files that are
 created in the period between proposing a CF standard_name and
 having it become part of the official table


   - Steve



Schultz, Martin wrote:

Dear all,

we are currently cleaning all files on our TFHTAP multi-model
experiment server to make them fully CF(1.0) conformant. It has been
about 3 years since we had drafted the original format description of
these experiments and also initiated the standard name discussion for
chemical constituents (thanks again to Christiane Textor who did a lot
of this initial work). Many standard names which we needed have now been
defined (thanks to all who contributed and to Allison for maintaining
the list!). Nevertheless, there are a number of model variables left for
which no standard name has been agreed upon and where we (or the CF
mailing list group) also felt that they are too specialized to deserve a
standard name. From the perspective of the CF community this may not
be an issue, but in the context of interoperability (we now operate a
WCS server to share these files) the fact that some variables do have a
standard_name attribute and others don't poses considerable challenges.
The CF convention states that either standard_name or long_name should
be present. In our view, the long_name attribute is a poor substitute
for the standard_name, because it has no rules attached. We are now
planning to substitute illegal standard_name attributes by a new
htap-_standard_name attribute, which shall make clear that these names
are derived according to the CF guidelines, but they are not accepted
standard_names. Such a concept would enable software tools to easily
scan additional standard_name tables and make use of the well-defined
semantics that a standard_name provides without having to 

Re: [CF-metadata] non-standard standard_names -- CF alternative names

2010-05-12 Thread John Graybeal
OK, now I have to submit my other notion after all, which I think addresses 
some of Steve's concerns.  But let me semi-agree with his first paragraph -- 
I'm enthusiastic, but I think there are a lot of details to be agreed on.  I'll 
come back to that in a separate post.

I had thought it was important to provide a way to enter proposed CF terms in a 
common way/place, so that they can (a) be used by the originators and the 
community in the meantime, (b) be seen by the CF folks, and (c) be 
dispositioned appropriately when CF either accepts them or rejects them. So my 
proposal was to create a vocabulary, or more precisely an RDF store, that lets 
us:
 1) declare a name that may be proposed as a CF candidate
 2) make a statement that the name has been (or even 'is being') submitted to 
CF for consideration
 3a) make a statement that the name has been accepted as a CF name, and 
therefore is deprecated as a proposed name
 3b) make a statement that the name has been rejected as a CF name, and 
therefore is deprecated as a proposed name
In either 3a or 3b, 
 4) make a statement that the replacement representation of the name is xyz in 
some other vocabulary

The relationship of this proposal to the previous thread is that it provides an 
implementation mechanism for the life cycle of the provisional terms. It also 
helps assure some of the things Steve is trying to ensure -- some of which only 
recently became possible with CF, and even that manually, not through any 
automatable utility, interface, or URI convention.

Anyway, I don't want to encourage a detailed discussion of the above proposal, 
as it is secondary to Martin's original suggestion, and I feel sure it will 
have to be considered at some length in TRAC if we get that far. Just wanted to 
mention that the semantic technologies can enable some very useful 
views/approaches to some of these problems.

John

On May 12, 2010, at 11:22, Steve Hankin wrote:

 Hi Martin,
 
 You've had two enthusiastic yes responses, so I guess I have the privilege 
 to be the wet blanket.  So it goes.  I will give only a very cautious and 
 limited yes.  Not an outright  no ... but a suggestion for more thought 
 and discussion. 
 
 The proposal here is effectively the creation of 'private tables' as a means 
 of achieving extensibility.  We've had an opportunity to see the hazards 
 embedded in this approach as a long-term evolutionary process in WMO. Over 
 time the custom tables evolve to have an quasi-official status -- entire 
 sub-communities rely upon them -- but without necessarily a corresponding 
 methodical control over their creation and distribution. With BUFR and GRIB 
 files the proliferation of distinct tables has lead to serious 
 interoperability problems.
 
 To avoid repeating these problems with your proposal, CF clients must be 
 provided with iron-clad ways to be assured that they are referring to the 
 same vocabulary tables that the data author was referring to at the time that 
 the data were written.  Since we want CF files to ensure interoperability 
 when there are years separating the writing of data from reading it, your 
 strategy needs to ensure careful version control over the private tables.  
 This imposes a significant burden on you as the creator of a 
 project_standard_name table -- essentially a requirement to retain and 
 serve out older table versions in perpetuity (we could argue over what that 
 means).  The use of semantic web technologies will not alter these 
 considerations for the foreseeable future (tho over the long term 
 sophisticated inference engines might ...).  The ontologies still need to be 
 informed by correct information, which implies knowledge of the 
 version-controlled private vocabularies.
 
 A project_standard_name may have one of three life histories:  it may 
 never become accepted into the standard_name table; it may be accepted as-is; 
 or it may be accepted with alterations.  The following suggested restriction 
 illustrates some of the difficulties: A variable can contain either a 
 standard_name or project_standard_name attribute but not both.  What's 
 behind this restriction?   Given the uncertain life history of a 
 project_standard_name, if it  has been in use for (say) a year and is found 
 in thousands of files that are being shared around the community, doesn't 
 that generate a need to continue support for it.
 
 Two alternative approaches (both flawed, of course ... the nature of the 
 beast):
 Should the CF standard_name process, itself, include a provisional 
 fast-track, that allows names to be added very quickly with no guarantee 
 that they will have a lasting status, but with an iron-clad guarantee that 
 the provisional names will be retained (and so-identified) in version-stamped 
 (older) CF vocabularies.
 or
 Might you be better off using a *truly private* vocabulary of  
 project_standard_name strings.  I.e. one that has no official status in 
 CF at all?  There is no