Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Simon.Cox
Semantic duplicates is my concern. 
I understand your assertion about the linked data environment, but note that it 
is only of no consequence if everyone is doing reasoning. 

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk] 
Sent: Tuesday, 25 September 2012 2:33 AM
To: Cox, Simon (CESRE, Kensington); jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Simon,

If you're referring to syntactic duplicates then providing the controlled 
vocabularies covering the grammar elements are well managed the issue is 
addressed.

If you're referring to semantic duplicates (i.e. multiple Standard Names built 
from synonyms) then no, but there are opinions that have me 75% convinced that 
these are of little consequence in a linked data environment.

Cheers, Roy.


From: simon@csiro.au [simon@csiro.au]
Sent: 24 September 2012 04:33
To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the  
standard_name table

Sorry if this is an ignorant/newbie question, but can I ask if the grammar for 
CF std_names implicitly provides a check on duplicates?

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
Sent: Saturday, 22 September 2012 4:27 PM
To: John Graybeal; Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE, Kensington)
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :->)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

> Hi All,
>
> I am just catching up on the backlog of CF emails.  My sense too is that this 
> discussion is trying to solve the problems caused by a lack of grammar with 
> alternatives and/or stopgaps.  My pr

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Cameron-smith, Philip
Hi Simon, et al.,

The question of  semantic duplicates is something that CF has strongly tried to 
avoid.In a system where we control the grammar and vocabulary rather than 
the std_names, we would either need to design it so that semantic duplicates 
are impossible, or find a way to link equivalent terms.  

Fortunately with my proposal this is not an issue because the primary control 
stays on the std_names.  We should also learn whether semantic duplicates will 
be a problem before we make a final leap.

Best wishes,

  Philip

---
Dr Philip Cameron-Smith, p...@llnl.gov, Lawrence Livermore National Lab.
---


> -Original Message-
> From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
> Sent: Monday, September 24, 2012 11:33 AM
> To: simon@csiro.au; jgrayb...@ucsd.edu; Cameron-smith, Philip
> Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
> Subject: RE: [CF-metadata] Another potentially useful extension to the
> standard_name table
> 
> Hello Simon,
> 
> If you're referring to syntactic duplicates then providing the controlled
> vocabularies covering the grammar elements are well managed the issue is
> addressed.
> 
> If you're referring to semantic duplicates (i.e. multiple Standard Names built
> from synonyms) then no, but there are opinions that have me 75% convinced
> that these are of little consequence in a linked data environment.
> 
> Cheers, Roy.
> 
> 
> From: simon@csiro.au [simon@csiro.au]
> Sent: 24 September 2012 04:33
> To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
> Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
> Subject: RE: [CF-metadata] Another potentially useful extension to the
> standard_name table
> 
> Sorry if this is an ignorant/newbie question, but can I ask if the grammar 
> for CF
> std_names implicitly provides a check on duplicates?
> 
> Simon
> 
> -Original Message-
> From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
> Sent: Saturday, 22 September 2012 4:27 PM
> To: John Graybeal; Cameron-smith, Philip
> Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE,
> Kensington)
> Subject: RE: [CF-metadata] Another potentially useful extension to the
> standard_name table
> 
> Hello Philip/John,
> 
> As John might remember, I attempted this approach a while back (I think I
> started in 2004) on another parameter vocabulary (the BODC vocabulary
> subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it
> operationally.  This was because of two issues:
> 
> 1) I never managed to derive a single model that described all the parameters 
> in
> the dictionary - things like 'Concentration of PCB118 per unit wet weight of
> Mytilus edulis flesh' were particularly troublesome.
> 2) I simply ran out of energy building some of the vocabularies and never
> completed them
> 
> Admittedly, the problem was on a different scale - the vocab I was working on 
> is
> ten times the size of the Standard Names list with a lot of biology in it. 
> Further
> there are now standard resources available - such as WoRMS for taxon names
> that weren't mature then.  This brings me to the point that any development of
> grammar element vocabularies in CF should not be done in isolation, but should
> be sure to incorporate resources such as WoRMS for taxa, EEA for atmospheric
> pollutants and CAS for organic molecules.
> 
> However, there are other grammar-related vocabularies in CF such as the words
> used to express the amount of something in a matrix that should be in
> vocabularies that are totally under CF governance.  I totally agree that
> establishing these based on Jonathan's grammar would be a valuable step
> forward and would be happy to help do this.  This would be even more valuable
> if done in a collaborative manner that allowed a Standards Name List to be 
> built
> from distributed semantic elements - or even interoperable semantic elements
> (nudge nudge wink wink!!).
> 
> As Philip suggests, an ideal kick-off for this process would be for Jonathan 
> to
> prepare a grammar for the recently published Version 20 of Standard Names list
> and this time let's see if those of us in CF interested in parameter 
> semantics can
> give his work the development it deserves.
> 
> Cheers, Roy.
> 
> 
> From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John
> Graybeal [jgrayb...@ucsd.edu]
> Sent: 22 September 2012 00:09
> To: Cameron-smith, Philip
> Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
> Subject: Re: [CF-metadata] Another potentially useful extension to the
> standard_name table
> 
> I like this.
> 
> I may be a step behind, but given a grammar parser/generator, we will have
> identified the slots. But we will not have identified all the terms that can 
> fill
> those slots.
> 
>

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Lowry, Roy K.
Hello Simon,

If you're referring to syntactic duplicates then providing the controlled 
vocabularies covering the grammar elements are well managed the issue is 
addressed.

If you're referring to semantic duplicates (i.e. multiple Standard Names built 
from synonyms) then no, but there are opinions that have me 75% convinced that 
these are of little consequence in a linked data environment.

Cheers, Roy.


From: simon@csiro.au [simon@csiro.au]
Sent: 24 September 2012 04:33
To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the  
standard_name table

Sorry if this is an ignorant/newbie question, but can I ask if the grammar for 
CF std_names implicitly provides a check on duplicates?

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
Sent: Saturday, 22 September 2012 4:27 PM
To: John Graybeal; Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE, Kensington)
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :->)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

> Hi All,
>
> I am just catching up on the backlog of CF emails.  My sense too is that this 
> discussion is trying to solve the problems caused by a lack of grammar with 
> alternatives and/or stopgaps.  My preference is to overcome the grammar/vocab 
> challenge, but I am well aware that an accepted solution has not yet occurred.
>
> In order to get us on the right track, I propose we take advantage of 
> Jonathan's suggestion in a way that doesn't require a full grammar/vocab 
> definition, and doesn't require any changes to the controlling CF documents.
>
> Specifically, I propose the following:
>
> 1) We leverage Jonathan's grammar program into (a) a program that checks a 
> proposed std_name by parsing it to see whether 

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Lowry, Roy K.
Hello Jonathan,

I'm finding myself in total agreement with you and hope to start demonstrate 
over the coming weeks how vocabulary server technology can cover both Martin's 
use case and the use case for whic you did your work.

Cheers, Roy.


From: Jonathan Gregory [j.m.greg...@reading.ac.uk]
Sent: 24 September 2012 17:53
To: Schultz, Martin
Cc: Lowry, Roy K.; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Another potentially useful extension to the 
standard_name table

Dear Martin, Roy et al.

> I understand exactly what you want - or at least I thing I do.  I think that 
> you would like to enter a URL representing the concept 'carbon monoxide' and 
> get back a document giving you all the Standard Names pertaining to carbon 
> monoxide.  Am I right?

I appreciate that this need is not the same one as a system for proposing
new standard names, on which I agree with the way Philip described it. But
couldn't both needs be served by a generating grammar? If we have a complete
grammar of standard_names, then we can record in the standard_name table how
each one is generated from phrases, as well as the final result. Although you
cannot always parse a standard_name unambiguously, you could look up what the
correct decomposition was, or search the table for the occurrence of a
particular species or other phrase in the decompositions.

Later we could take a step further by recognising that some names are
irregular and giving their decomposition in regular terms. Thus, for instance,
specific_humidity could be recognised as composed of the semantic elements
which would normally yield mass_fraction_of_water_vapor_in_air (which does not
occur in fact). This is somewhat like Martin's idea of aliases for irregular
names. It is like recognising that "went" = go + -ed. Of course, there is
plenty of linguistic theory about this kind of thing! My grammar does not
currently have such transformational rules.

Best wishes

Jonathan-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Jonathan Gregory
Dear Martin, Roy et al.

> I understand exactly what you want - or at least I thing I do.  I think that 
> you would like to enter a URL representing the concept 'carbon monoxide' and 
> get back a document giving you all the Standard Names pertaining to carbon 
> monoxide.  Am I right?

I appreciate that this need is not the same one as a system for proposing
new standard names, on which I agree with the way Philip described it. But
couldn't both needs be served by a generating grammar? If we have a complete
grammar of standard_names, then we can record in the standard_name table how
each one is generated from phrases, as well as the final result. Although you
cannot always parse a standard_name unambiguously, you could look up what the
correct decomposition was, or search the table for the occurrence of a
particular species or other phrase in the decompositions.

Later we could take a step further by recognising that some names are
irregular and giving their decomposition in regular terms. Thus, for instance,
specific_humidity could be recognised as composed of the semantic elements
which would normally yield mass_fraction_of_water_vapor_in_air (which does not
occur in fact). This is somewhat like Martin's idea of aliases for irregular
names. It is like recognising that "went" = go + -ed. Of course, there is
plenty of linguistic theory about this kind of thing! My grammar does not
currently have such transformational rules.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Simon.Cox
Sorry if this is an ignorant/newbie question, but can I ask if the grammar for 
CF std_names implicitly provides a check on duplicates? 

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk] 
Sent: Saturday, 22 September 2012 4:27 PM
To: John Graybeal; Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE, Kensington)
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :->)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

> Hi All,
>
> I am just catching up on the backlog of CF emails.  My sense too is that this 
> discussion is trying to solve the problems caused by a lack of grammar with 
> alternatives and/or stopgaps.  My preference is to overcome the grammar/vocab 
> challenge, but I am well aware that an accepted solution has not yet occurred.
>
> In order to get us on the right track, I propose we take advantage of 
> Jonathan's suggestion in a way that doesn't require a full grammar/vocab 
> definition, and doesn't require any changes to the controlling CF documents.
>
> Specifically, I propose the following:
>
> 1) We leverage Jonathan's grammar program into (a) a program that checks a 
> proposed std_name by parsing it to see whether it fits existing grammar/vocab 
> rules, and/or (b) a std_name generation program.
>
> 2) Std_names are still proposed in the ordinary way, but if they have passed 
> the checker or been created through the generator then it will be easy for 
> people to accept them.  We might even move to a mode in which pre-approved 
> std_names are automatically accepted after a month, unless someone objects.
>
> This has several advantages:
>
> A) It will reduce time and effort by everyone to get std_names approved.
> B) Neither the parser nor the generator needs to be complete (ie, it 
> is OK if some existing names don't comply, or there are some valid new 
> cases they don't cover)
> C) Proposals that don't fit the standard construction ca

[CF-metadata] Choice of fill value for unpacked data

2012-09-24 Thread Bentley, Philip
Hi folks,

The final para of section 2.5.1 of the CF conventions document describes
the use of the _FillValue (or missing_value) attribute in the case of
data packed using the scale-and-offset method.  What is not clear - at
least to me - is what the preferred application behaviour should be in
the case where the data is unpacked and then written out to a new netCDF
file. In particular, what fill value should be used for the unpacked
data variable?

I presume that one wouldn't normally want to use the original fill value
since that value (typically an 8- or 16-bit integer) is quite likely to
fall within the normal range of the unpacked data (typically a 32- or
64-bit float).

In the absence of explicitly setting a fill value attribute on the
unpacked data variable I assume that the netCDF default fill value will
be used for the data type in question. Which may not always be desirable
(certainly not for 32-bit floats, where the default fill value can give
rise to subtle precision-related problems).

With this in mind, I was wondering if there is any merit in defining a
new attribute called, say, _UnpackedFillValue (or
unpacked_missing_value)? If client software detected this attribute then
the associated value (same data type as the scale_factor and add_offset
attributes) would be used as the fill value for the unpacked data
variable.

Alternatively, the names _FillValueUnpacked (missing_value_unpacked)
might be preferable since they would then appear together pair-wise in
CDL-type listings, e.g.

short pkd_var(z, y, x) :
   ...
   pkd_var:_FillValue = -32768 ;
   pkd_var:_FillValueUnpacked = -1.0e30 ;
   pkd_var:add_offset = 42.0 ;
   pkd_var:scale_factor = 1234.0 ;
   ...


Any merit/mileage in this idea?

Phil
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata