Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Jonathan Gregory
Dear Martin, Roy et al.

 I understand exactly what you want - or at least I thing I do.  I think that 
 you would like to enter a URL representing the concept 'carbon monoxide' and 
 get back a document giving you all the Standard Names pertaining to carbon 
 monoxide.  Am I right?

I appreciate that this need is not the same one as a system for proposing
new standard names, on which I agree with the way Philip described it. But
couldn't both needs be served by a generating grammar? If we have a complete
grammar of standard_names, then we can record in the standard_name table how
each one is generated from phrases, as well as the final result. Although you
cannot always parse a standard_name unambiguously, you could look up what the
correct decomposition was, or search the table for the occurrence of a
particular species or other phrase in the decompositions.

Later we could take a step further by recognising that some names are
irregular and giving their decomposition in regular terms. Thus, for instance,
specific_humidity could be recognised as composed of the semantic elements
which would normally yield mass_fraction_of_water_vapor_in_air (which does not
occur in fact). This is somewhat like Martin's idea of aliases for irregular
names. It is like recognising that went = go + -ed. Of course, there is
plenty of linguistic theory about this kind of thing! My grammar does not
currently have such transformational rules.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Lowry, Roy K.
Hello Jonathan,

I'm finding myself in total agreement with you and hope to start demonstrate 
over the coming weeks how vocabulary server technology can cover both Martin's 
use case and the use case for whic you did your work.

Cheers, Roy.


From: Jonathan Gregory [j.m.greg...@reading.ac.uk]
Sent: 24 September 2012 17:53
To: Schultz, Martin
Cc: Lowry, Roy K.; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Another potentially useful extension to the 
standard_name table

Dear Martin, Roy et al.

 I understand exactly what you want - or at least I thing I do.  I think that 
 you would like to enter a URL representing the concept 'carbon monoxide' and 
 get back a document giving you all the Standard Names pertaining to carbon 
 monoxide.  Am I right?

I appreciate that this need is not the same one as a system for proposing
new standard names, on which I agree with the way Philip described it. But
couldn't both needs be served by a generating grammar? If we have a complete
grammar of standard_names, then we can record in the standard_name table how
each one is generated from phrases, as well as the final result. Although you
cannot always parse a standard_name unambiguously, you could look up what the
correct decomposition was, or search the table for the occurrence of a
particular species or other phrase in the decompositions.

Later we could take a step further by recognising that some names are
irregular and giving their decomposition in regular terms. Thus, for instance,
specific_humidity could be recognised as composed of the semantic elements
which would normally yield mass_fraction_of_water_vapor_in_air (which does not
occur in fact). This is somewhat like Martin's idea of aliases for irregular
names. It is like recognising that went = go + -ed. Of course, there is
plenty of linguistic theory about this kind of thing! My grammar does not
currently have such transformational rules.

Best wishes

Jonathan-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Lowry, Roy K.
Hello Simon,

If you're referring to syntactic duplicates then providing the controlled 
vocabularies covering the grammar elements are well managed the issue is 
addressed.

If you're referring to semantic duplicates (i.e. multiple Standard Names built 
from synonyms) then no, but there are opinions that have me 75% convinced that 
these are of little consequence in a linked data environment.

Cheers, Roy.


From: simon@csiro.au [simon@csiro.au]
Sent: 24 September 2012 04:33
To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the  
standard_name table

Sorry if this is an ignorant/newbie question, but can I ask if the grammar for 
CF std_names implicitly provides a check on duplicates?

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
Sent: Saturday, 22 September 2012 4:27 PM
To: John Graybeal; Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE, Kensington)
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :-)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

 Hi All,

 I am just catching up on the backlog of CF emails.  My sense too is that this 
 discussion is trying to solve the problems caused by a lack of grammar with 
 alternatives and/or stopgaps.  My preference is to overcome the grammar/vocab 
 challenge, but I am well aware that an accepted solution has not yet occurred.

 In order to get us on the right track, I propose we take advantage of 
 Jonathan's suggestion in a way that doesn't require a full grammar/vocab 
 definition, and doesn't require any changes to the controlling CF documents.

 Specifically, I propose the following:

 1) We leverage Jonathan's grammar program into (a) a program that checks a 
 proposed std_name by parsing it to see whether it fits

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Cameron-smith, Philip
Hi Simon, et al.,

The question of  semantic duplicates is something that CF has strongly tried to 
avoid.In a system where we control the grammar and vocabulary rather than 
the std_names, we would either need to design it so that semantic duplicates 
are impossible, or find a way to link equivalent terms.  

Fortunately with my proposal this is not an issue because the primary control 
stays on the std_names.  We should also learn whether semantic duplicates will 
be a problem before we make a final leap.

Best wishes,

  Philip

---
Dr Philip Cameron-Smith, p...@llnl.gov, Lawrence Livermore National Lab.
---


 -Original Message-
 From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
 Sent: Monday, September 24, 2012 11:33 AM
 To: simon@csiro.au; jgrayb...@ucsd.edu; Cameron-smith, Philip
 Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
 Subject: RE: [CF-metadata] Another potentially useful extension to the
 standard_name table
 
 Hello Simon,
 
 If you're referring to syntactic duplicates then providing the controlled
 vocabularies covering the grammar elements are well managed the issue is
 addressed.
 
 If you're referring to semantic duplicates (i.e. multiple Standard Names built
 from synonyms) then no, but there are opinions that have me 75% convinced
 that these are of little consequence in a linked data environment.
 
 Cheers, Roy.
 
 
 From: simon@csiro.au [simon@csiro.au]
 Sent: 24 September 2012 04:33
 To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
 Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
 Subject: RE: [CF-metadata] Another potentially useful extension to the
 standard_name table
 
 Sorry if this is an ignorant/newbie question, but can I ask if the grammar 
 for CF
 std_names implicitly provides a check on duplicates?
 
 Simon
 
 -Original Message-
 From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
 Sent: Saturday, 22 September 2012 4:27 PM
 To: John Graybeal; Cameron-smith, Philip
 Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE,
 Kensington)
 Subject: RE: [CF-metadata] Another potentially useful extension to the
 standard_name table
 
 Hello Philip/John,
 
 As John might remember, I attempted this approach a while back (I think I
 started in 2004) on another parameter vocabulary (the BODC vocabulary
 subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it
 operationally.  This was because of two issues:
 
 1) I never managed to derive a single model that described all the parameters 
 in
 the dictionary - things like 'Concentration of PCB118 per unit wet weight of
 Mytilus edulis flesh' were particularly troublesome.
 2) I simply ran out of energy building some of the vocabularies and never
 completed them
 
 Admittedly, the problem was on a different scale - the vocab I was working on 
 is
 ten times the size of the Standard Names list with a lot of biology in it. 
 Further
 there are now standard resources available - such as WoRMS for taxon names
 that weren't mature then.  This brings me to the point that any development of
 grammar element vocabularies in CF should not be done in isolation, but should
 be sure to incorporate resources such as WoRMS for taxa, EEA for atmospheric
 pollutants and CAS for organic molecules.
 
 However, there are other grammar-related vocabularies in CF such as the words
 used to express the amount of something in a matrix that should be in
 vocabularies that are totally under CF governance.  I totally agree that
 establishing these based on Jonathan's grammar would be a valuable step
 forward and would be happy to help do this.  This would be even more valuable
 if done in a collaborative manner that allowed a Standards Name List to be 
 built
 from distributed semantic elements - or even interoperable semantic elements
 (nudge nudge wink wink!!).
 
 As Philip suggests, an ideal kick-off for this process would be for Jonathan 
 to
 prepare a grammar for the recently published Version 20 of Standard Names list
 and this time let's see if those of us in CF interested in parameter 
 semantics can
 give his work the development it deserves.
 
 Cheers, Roy.
 
 
 From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John
 Graybeal [jgrayb...@ucsd.edu]
 Sent: 22 September 2012 00:09
 To: Cameron-smith, Philip
 Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
 Subject: Re: [CF-metadata] Another potentially useful extension to the
 standard_name table
 
 I like this.
 
 I may be a step behind, but given a grammar parser/generator, we will have
 identified the slots. But we will not have identified all the terms that can 
 fill
 those slots.
 
 I don't think this is a huge challenge.  We will have (a) a list of terms 
 already
 filling those

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-24 Thread Simon.Cox
Semantic duplicates is my concern. 
I understand your assertion about the linked data environment, but note that it 
is only of no consequence if everyone is doing reasoning. 

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk] 
Sent: Tuesday, 25 September 2012 2:33 AM
To: Cox, Simon (CESRE, Kensington); jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Simon,

If you're referring to syntactic duplicates then providing the controlled 
vocabularies covering the grammar elements are well managed the issue is 
addressed.

If you're referring to semantic duplicates (i.e. multiple Standard Names built 
from synonyms) then no, but there are opinions that have me 75% convinced that 
these are of little consequence in a linked data environment.

Cheers, Roy.


From: simon@csiro.au [simon@csiro.au]
Sent: 24 September 2012 04:33
To: Lowry, Roy K.; jgrayb...@ucsd.edu; cameronsmi...@llnl.gov
Cc: cf-metadata@cgd.ucar.edu; j.m.greg...@reading.ac.uk
Subject: RE: [CF-metadata] Another potentially useful extension to the  
standard_name table

Sorry if this is an ignorant/newbie question, but can I ask if the grammar for 
CF std_names implicitly provides a check on duplicates?

Simon

-Original Message-
From: Lowry, Roy K. [mailto:r...@bodc.ac.uk]
Sent: Saturday, 22 September 2012 4:27 PM
To: John Graybeal; Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory; Cox, Simon (CESRE, Kensington)
Subject: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :-)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

 Hi All,

 I am just catching up on the backlog of CF emails.  My sense too is that this 
 discussion is trying to solve the problems caused by a lack of grammar with 
 alternatives and/or stopgaps.  My

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-22 Thread Lowry, Roy K.
Hello Philip/John,

As John might remember, I attempted this approach a while back (I think I 
started in 2004) on another parameter vocabulary (the BODC vocabulary 
subsequently adopted by SeaDataNet).  I have yet to succeed in implementing it 
operationally.  This was because of two issues:

1) I never managed to derive a single model that described all the parameters 
in the dictionary - things like 'Concentration of PCB118 per unit wet weight of 
Mytilus edulis flesh' were particularly troublesome.
2) I simply ran out of energy building some of the vocabularies and never 
completed them

Admittedly, the problem was on a different scale - the vocab I was working on 
is ten times the size of the Standard Names list with a lot of biology in it. 
Further there are now standard resources available - such as WoRMS for taxon 
names that weren't mature then.  This brings me to the point that any 
development of grammar element vocabularies in CF should not be done in 
isolation, but should be sure to incorporate resources such as WoRMS for taxa, 
EEA for atmospheric pollutants and CAS for organic molecules.

However, there are other grammar-related vocabularies in CF such as the words 
used to express the amount of something in a matrix that should be in 
vocabularies that are totally under CF governance.  I totally agree that 
establishing these based on Jonathan's grammar would be a valuable step forward 
and would be happy to help do this.  This would be even more valuable if done 
in a collaborative manner that allowed a Standards Name List to be built from 
distributed semantic elements - or even interoperable semantic elements (nudge 
nudge wink wink!!).

As Philip suggests, an ideal kick-off for this process would be for Jonathan to 
prepare a grammar for the recently published Version 20 of Standard Names list 
and this time let's see if those of us in CF interested in parameter semantics 
can give his work the development it deserves.

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Graybeal 
[jgrayb...@ucsd.edu]
Sent: 22 September 2012 00:09
To: Cameron-smith, Philip
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] Another potentially useful extension to the  
standard_name table

I like this.

I may be a step behind, but given a grammar parser/generator, we will have 
identified the slots. But we will not have identified all the terms that can 
fill those slots.

I don't think this is a huge challenge.  We will have (a) a list of terms 
already filling those slots, (b) candidate vocabularies that we could mine -- 
or designate -- or create -- to supply additional terms.  I would be delighted 
to participate in construction the list of terms and vocabularies.  (Especially 
if you let me use MMI to store them. Wink wink nudge nudge. :-)

Anyway, please correct me if I'm missing the boat, or tell me if there's 
already a plan.

John

On Sep 21, 2012, at 15:52, Cameron-smith, Philip wrote:

 Hi All,

 I am just catching up on the backlog of CF emails.  My sense too is that this 
 discussion is trying to solve the problems caused by a lack of grammar with 
 alternatives and/or stopgaps.  My preference is to overcome the grammar/vocab 
 challenge, but I am well aware that an accepted solution has not yet occurred.

 In order to get us on the right track, I propose we take advantage of 
 Jonathan's suggestion in a way that doesn't require a full grammar/vocab 
 definition, and doesn't require any changes to the controlling CF documents.

 Specifically, I propose the following:

 1) We leverage Jonathan's grammar program into (a) a program that checks a 
 proposed std_name by parsing it to see whether it fits existing grammar/vocab 
 rules, and/or (b) a std_name generation program.

 2) Std_names are still proposed in the ordinary way, but if they have passed 
 the checker or been created through the generator then it will be easy for 
 people to accept them.  We might even move to a mode in which pre-approved 
 std_names are automatically accepted after a month, unless someone objects.

 This has several advantages:

 A) It will reduce time and effort by everyone to get std_names approved.
 B) Neither the parser nor the generator needs to be complete (ie, it is OK if 
 some existing names don't comply, or there are some valid new cases they 
 don't cover)
 C) Proposals that don't fit the standard construction can still be approved, 
 and will highlight ways to complete and extend the parser/generator.
 D) Any mistakes by the parser/generator should be caught by the email list.

 I see no disadvantages other than the need for someone to create the parser 
 and/or generator, which should be technically straightforward.

 Best wishes,

   Philip

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-22 Thread Schultz, Martin
Dear Philip, John and others,

  I take the point that indeed a grammar approach would be the solution to 
my problem. However, the grammar as it once stood based on Jonathan's python 
program (which indeed works quite nicely) unfortunately doesn't help with 
respect to the problem that I intended to solve with the addition of 
attribute tags (specifically compound). The problem is that the current 
grammar, derived from parsing the standard_name table, does not take into 
account semantic relations, but is strictly rule-based. Although I am not able 
to prove this now, the experience I gathered with Jonathan's tool and the 
associated lexicon suggests that it would require a major overhaul of the 
standard_name table in order to make it parseable in a sense that the 
relations among terms are not mere (computer) rule constructs, but make sense 
for the human reader. In essence, this is why I opened track ticket #91. 
Unfortunately, I haven't found the time yet to take this any further. ..

Personally, I am much less worried about the procedures for suggesting and 
accepting standard_names. I fully agree that a grammar-based approach would 
also help in this regard, but that is a different issue.

If I were in charge of creating a new standard_name table from scratch, I 
would go for a rigorous grammar-based syntax, where (sorry to bring this up 
again) the standard_name for air_temperature would be temperature_of_air in 
order to identify the relation propertofmedium, etc. Indeed, in this 
hypothetical standard_name table, one would define aliases and give them a more 
prominent role than now, i.e. it would be fine to use air_temperature 
(aliases should not be considered deprecated as is often the case in the 
current table). The interoperable application could then look up the real 
standard_name behind the alias and find something that can indeed be parsed - 
eh voila: you get what you need, i.e. you will know that you have a property 
and a medium, and that the property is temperature and the medium is air.

Of course, I am not in charge if creating a new standard_name table (and I 
am sure no one would like me to be in charge ;-), but I hope this illustrates 
the problem we have with the current table. Sad as it seems, I really see only 
two options: A) if most people agree that a grammar-based approach is the way 
to go, then we need to start overhauling the standard_name table (track ticket 
#91) and slowly transform it into something that makes sense (please don't 
misunderstand this phrase!). Option B): we leave things as they are, but then 
we would indeed have to further discuss the attribute idea, because this 
would provide a way of interpreting standard_names without having to parse them 
(which, as I hope to have made clear, is impossible at present).

  I agree with the precautions that were raised in that the attributes 
pose some danger of becoming uncontrolled and simply too many. However, perhaps 
it is not so bad, because the standard_names usually consist of no more than 6 
lexical tokens, and if we could agree that there should be not more than one 
attribute per lexical token (and these would anyhow be optional), then it 
appears manageable and finite.

With somewhat Quichotte'sque feelings,

Martin






Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt



Kennen Sie schon unsere app? http://www.fz-juelich.de/app
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-22 Thread Lowry, Roy K.
Hello Martin,

I understand exactly what you want - or at least I thing I do.  I think that 
you would like to enter a URL representing the concept 'carbon monoxide' and 
get back a document giving you all the Standard Names pertaining to carbon 
monoxide.  Am I right?

My vision - which I'm pretty sure John Graybeal shares - is of a grammar in 
which each element is populated from a controlled vocabulary comprising 
concepts that are included in a thesaurus or more likely a full-blown ontology.

Does that sound like what you need?

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Schultz, 
Martin [m.schu...@fz-juelich.de]
Sent: 22 September 2012 16:26
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Another potentially useful extension to the 
standard_name table

Dear Philip, John and others,

  I take the point that indeed a grammar approach would be the solution to 
my problem. However, the grammar as it once stood based on Jonathan's python 
program (which indeed works quite nicely) unfortunately doesn't help with 
respect to the problem that I intended to solve with the addition of 
attribute tags (specifically compound). The problem is that the current 
grammar, derived from parsing the standard_name table, does not take into 
account semantic relations, but is strictly rule-based. Although I am not able 
to prove this now, the experience I gathered with Jonathan's tool and the 
associated lexicon suggests that it would require a major overhaul of the 
standard_name table in order to make it parseable in a sense that the 
relations among terms are not mere (computer) rule constructs, but make sense 
for the human reader. In essence, this is why I opened track ticket #91. 
Unfortunately, I haven't found the time yet to take this any further. ..

Personally, I am much less worried about the procedures for suggesting and 
accepting standard_names. I fully agree that a grammar-based approach would 
also help in this regard, but that is a different issue.

If I were in charge of creating a new standard_name table from scratch, I 
would go for a rigorous grammar-based syntax, where (sorry to bring this up 
again) the standard_name for air_temperature would be temperature_of_air in 
order to identify the relation propertofmedium, etc. Indeed, in this 
hypothetical standard_name table, one would define aliases and give them a more 
prominent role than now, i.e. it would be fine to use air_temperature 
(aliases should not be considered deprecated as is often the case in the 
current table). The interoperable application could then look up the real 
standard_name behind the alias and find something that can indeed be parsed - 
eh voila: you get what you need, i.e. you will know that you have a property 
and a medium, and that the property is temperature and the medium is air.

Of course, I am not in charge if creating a new standard_name table (and I 
am sure no one would like me to be in charge ;-), but I hope this illustrates 
the problem we have with the current table. Sad as it seems, I really see only 
two options: A) if most people agree that a grammar-based approach is the way 
to go, then we need to start overhauling the standard_name table (track ticket 
#91) and slowly transform it into something that makes sense (please don't 
misunderstand this phrase!). Option B): we leave things as they are, but then 
we would indeed have to further discuss the attribute idea, because this 
would provide a way of interpreting standard_names without having to parse them 
(which, as I hope to have made clear, is impossible at present).

  I agree with the precautions that were raised in that the attributes 
pose some danger of becoming uncontrolled and simply too many. However, perhaps 
it is not so bad, because the standard_names usually consist of no more than 6 
lexical tokens, and if we could agree that there should be not more than one 
attribute per lexical token (and these would anyhow be optional), then it 
appears manageable and finite.

With somewhat Quichotte'sque feelings,

Martin






Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt



Kennen Sie schon unsere app? http://www.fz-juelich.de/app

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-22 Thread Schultz, Martin
Hi Roy,

 exactly!Just how can we get there?

Cheers,

Martin

-Ursprüngliche Nachricht-
Von: Lowry, Roy K. [mailto:r...@bodc.ac.uk] 
Gesendet: Samstag, 22. September 2012 18:24
An: Schultz, Martin; cf-metadata@cgd.ucar.edu
Betreff: RE: [CF-metadata] Another potentially useful extension to the 
standard_name table

Hello Martin,

I understand exactly what you want - or at least I thing I do.  I think that 
you would like to enter a URL representing the concept 'carbon monoxide' and 
get back a document giving you all the Standard Names pertaining to carbon 
monoxide.  Am I right?

My vision - which I'm pretty sure John Graybeal shares - is of a grammar in 
which each element is populated from a controlled vocabulary comprising 
concepts that are included in a thesaurus or more likely a full-blown ontology.

Does that sound like what you need?

Cheers, Roy.


From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Schultz, 
Martin [m.schu...@fz-juelich.de]
Sent: 22 September 2012 16:26
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Another potentially useful extension to the 
standard_name table

Dear Philip, John and others,

  I take the point that indeed a grammar approach would be the solution to 
my problem. However, the grammar as it once stood based on Jonathan's python 
program (which indeed works quite nicely) unfortunately doesn't help with 
respect to the problem that I intended to solve with the addition of 
attribute tags (specifically compound). The problem is that the current 
grammar, derived from parsing the standard_name table, does not take into 
account semantic relations, but is strictly rule-based. Although I am not able 
to prove this now, the experience I gathered with Jonathan's tool and the 
associated lexicon suggests that it would require a major overhaul of the 
standard_name table in order to make it parseable in a sense that the 
relations among terms are not mere (computer) rule constructs, but make sense 
for the human reader. In essence, this is why I opened track ticket #91. 
Unfortunately, I haven't found the time yet to take this any further. ..

Personally, I am much less worried about the procedures for suggesting and 
accepting standard_names. I fully agree that a grammar-based approach would 
also help in this regard, but that is a different issue.

If I were in charge of creating a new standard_name table from scratch, I 
would go for a rigorous grammar-based syntax, where (sorry to bring this up 
again) the standard_name for air_temperature would be temperature_of_air in 
order to identify the relation propertofmedium, etc. Indeed, in this 
hypothetical standard_name table, one would define aliases and give them a more 
prominent role than now, i.e. it would be fine to use air_temperature 
(aliases should not be considered deprecated as is often the case in the 
current table). The interoperable application could then look up the real 
standard_name behind the alias and find something that can indeed be parsed - 
eh voila: you get what you need, i.e. you will know that you have a property 
and a medium, and that the property is temperature and the medium is air.

Of course, I am not in charge if creating a new standard_name table (and I 
am sure no one would like me to be in charge ;-), but I hope this illustrates 
the problem we have with the current table. Sad as it seems, I really see only 
two options: A) if most people agree that a grammar-based approach is the way 
to go, then we need to start overhauling the standard_name table (track ticket 
#91) and slowly transform it into something that makes sense (please don't 
misunderstand this phrase!). Option B): we leave things as they are, but then 
we would indeed have to further discuss the attribute idea, because this 
would provide a way of interpreting standard_names without having to parse them 
(which, as I hope to have made clear, is impossible at present).


  I agree with the precautions that were raised in that the attributes 
pose some danger of becoming uncontrolled and simply too many. However, perhaps 
it is not so bad, because the standard_names usually consist of no more than 6 
lexical tokens, and if we could agree that there should be not more than one 
attribute per lexical token (and these would anyhow be optional), then it 
appears manageable and finite.

With somewhat Quichotte'sque feelings,

Martin






Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke 
(stellv. Vorsitzender

Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-18 Thread Jonathan Gregory
Dear Martin, Roy, John, Robert

Reading the last few days' emails all at once I have may have skipped
important details; if so, apologies for that. I too am in favour of a grammar,
such as my earlier attempt
http://climate.ncas.ac.uk/~jonathan/CF_metadata/14.1/
Robert subsequently coded this grammar in an appropriate software language.
This grammar has only one level of patterns, but some of its lexicon could
be reduced further by having more than one level.

A grammar would be useful for constructing standard names. People 
proposing names could be offered menus that allowed them to suggest names that
followed existing patterns, or extensions to vocabulary, or new patterns.
Thus all standard_names would naturally exist both as a specification that
consists of a pattern with specific vocabulary items filling certain place-
holders (the semantic tags, in effect), and as a joined-up standard_name. It's
equivalent information. The specification could also be automatically
translated into the accompanying description, since each pattern or semantic
tag could trigger an appropriate descriptive text.

It's easier to construct standard_names than to parse them, although parsing
is possible. Hence it may be useful to give software access to the spec as
well as the joined-up name.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-18 Thread Bryan Lawrence
Hi Roy

I'd vote for having the discussion about setting up a project to deliver this 
on list ...  

Cheers
Bryan

 
 Hello Robert,
 
 To my mind, data modelling of standard names should be based on the type of 
 approach you've been advocating.
 
 Good point concerning expressions of interest.  If people want to keep the CF 
 list traffic a bit lighter then contact me directly 
 (r...@bodc.ac.ukmailto:r...@bodc.ac.uk).
 
 Cheers, Roy.
 
 From: Robert Muetzelfeldt [r.muetzelfe...@ed.ac.uk]
 Sent: 14 September 2012 10:18
 To: Lowry, Roy K.
 Cc: cf-metadata@cgd.ucar.edu; Brown, Juan
 Subject: Re: [CF-metadata] Another potentially useful extension to the 
 standard_name table
 
 Hello Roy,
 
 On 14/09/12 08:23, Lowry, Roy K. wrote:
 Dear All,
 
 I am becoming concerned that a 'design by committee' data modelling process 
 for Standard Names is unfolding on the list.
 
 The risk is that this will result in a series of disjoint extensions with 
 significant semantic overlap hung off the standard name.  I can already see 
 this happening with Martin's 'compound' concept and Jonathan's 'species' 
 concept.
 Well, that's a good reason for developing a semantic-grammar approach for 
 representing standard names, something I've been arguing for for some time.   
 It avoids these ad-hoc extensions, which are also much less expressive than a 
 grammar.   (Commenting on the XML design does not mean that I endorse the use 
 of extensions...).
 
 Such a process is the inevitable result of the 'best efforts' culture that 
 underpins CF.  For example, Martin is driven to present an XML encoding 
 rather than a use case because he knows that an encoding has more chance of 
 being taken forward than a use case.
 
 This leads to ask the question whether there is any possibility of our doing 
 the job properly.  Who would be interested in getting involved?  Is there any 
 possibility of putting together a consortium to develop a proposal for 
 funding to do the job?  I know one golden opportunity for such a process has 
 just passed by, but others will undoubtedly come along.
 This is a good idea, and I'd be happy to join in.  What process do you want 
 for eliciting members and taking things forward: this list, or do you want 
 people to contact you off-list?
 
 Cheers,
 Robert
 
 
 Cheers, Roy.
 
 From: CF-metadata 
 [cf-metadata-boun...@cgd.ucar.edumailto:cf-metadata-boun...@cgd.ucar.edu] 
 On Behalf Of Robert Muetzelfeldt 
 [r.muetzelfe...@ed.ac.ukmailto:r.muetzelfe...@ed.ac.uk]
 Sent: 13 September 2012 10:21
 To: cf-metadata@cgd.ucar.edumailto:cf-metadata@cgd.ucar.edu
 Subject: Re: [CF-metadata] Another potentially useful extension to the 
 standard_name table
 
 Hi Martin,
 
 Is there some reason why the entry element must have a flat set of 
 sub-elements, as in your example below?It seems to me that from an XML 
 data-design point-of-view, a neater data model would be:
 entry id=...
 compound_name.../compound_nameSee note [1] below
 compound_codelist.../compound_codelist
 canonical_units ... see note [2] below .../canonical_units
 description.../description
 attribute_list
 attribute status=recommended name=emission_sector/
 attribute status=recommended name=emission_sector_reference/
 attribute status=recommended name=compound_group_members/
 attribute status=optional name=comments/
 /attribute_list
 /entry
 
 This design is:
 - more in keeping with conventional XML designs;
 - allows for additional forms of 'status' without changing the DTD/Schema;
 - facilitates processing (you are parsing *only* XML; you don't need separate 
 code to parse
   the text string for each of your 3 original elements).
 
 It is a matter of taste as to whether you prefer the above design of the 
 attribute element (i.e. with two (XML) attributes), or whether you would 
 prefer to have 'status' and 'name' as two sub-elements of attribute.
 
 The following notes are not directly relevant to your suggestion, but I might 
 as well make the points now:
 
 Note [1]
 A similar argument as the above applies to the two compound_... elements, 
 which would I think be better represented as:
 compound name=... codelist=.../
 or
 compound
 name.../name
 codelist.../codelist
 /compound
 But it may be that this decision is already fixed in stone.
 
 Note [2]
 A similar argument applies to canonical_units, except here I think the 
 principled approach would be to use the W3C Units in MathML ( 
 http://www.w3.org/TR/mathml-units/) or UnitsML ( http://unitsml.nist.gov/), 
 since both represent a concerted effort to develop a standard for 
 representing units in a machine-processable format.I can think of several 
 reasons why people might object vigourously to either solution: the current 
 design is also already fixed in stone; it is harder to write my hand; it is 
 harder for humans

[CF-metadata] Another potentially useful extension to the standard_name table

2012-09-13 Thread Schultz, Martin
Dear all,

 during the recent discussion on compound_name as additional tag in the 
standard_names.xml file and in relation to track ticket #90 it occurred to me 
that another useful addition could be to express the need of certain variable 
attributes in this table as well. This refers to the attempt of creating a 
CF-1.6-strict standard which would have more things mandatory in order to 
better support interoperable applications.

 One example (related to the new emission standard_names) could be:


-entry id= 
tendency_of_atmosphere_mass_content_of_alcohols_due_to_emission_from_industrial_processes_and_combustion
   compound_nameAlcohols/compound_name
   
compound_codelisthttp://rdfdata.eionet.europa.eu/airquality/components/??/compound_codelisthttp://rdfdata.eionet.europa.eu/airquality/components/??%3c/compound_codelist
 # ??
canonical_unitskg m-2 s-1/canonical_units
   description... /description
   required_attributesNone/required_attributes
   recommended_attributesemission_sector, emission_sector_reference, 
compound_group_members/recommended_attributes
   optional_attributescomments/optional_attributes
/entry

The idea is that optional refers to could in the description, recommended 
to should, and required to shall.

Best regards,

Martin


PD Dr. Martin G. Schultz
IEK-8, Forschungszentrum Jülich
D-52425 Jülich
Ph: +49 2461 61 2831





Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt



Kennen Sie schon unsere app? http://www.fz-juelich.de/app
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Another potentially useful extension to the standard_name table

2012-09-13 Thread Robert Muetzelfeldt

Hi Martin,

Is there some reason why the entry element must have a flat set of 
sub-elements, as in your example below?It seems to me that from an 
XML data-design point-of-view, a neater data model would be:

entry id=...
compound_name.../compound_name See note [1] below
compound_codelist.../compound_codelist
canonical_units ...see note [2] below .../canonical_units
description.../description
attribute_list
attribute status=recommended name=emission_sector/
attribute status=recommended name=emission_sector_reference/
attribute status=recommended name=compound_group_members/
attribute status=optional name=comments/
/attribute_list
/entry

This design is:
- more in keeping with conventional XML designs;
- allows for additional forms of 'status' without changing the DTD/Schema;
- facilitates processing (you are parsing *only* XML; you don't need 
separate code to parse

  the text string for each of your 3 original elements).

It is a matter of taste as to whether you prefer the above design of the 
attribute element (i.e. with two (XML) attributes), or whether you 
would prefer to have 'status' and 'name' as two sub-elements of attribute.


The following notes are not directly relevant to your suggestion, but I 
might as well make the points now:


Note [1]
A similar argument as the above applies to the two compound_... 
elements, which would I think be better represented as:

compound name=... codelist=.../
or
compound
name.../name
codelist.../codelist
/compound
But it may be that this decision is already fixed in stone.

Note [2]
A similar argument applies to canonical_units, except here I think the 
principled approach would be to use the W3C Units in MathML ( 
http://www.w3.org/TR/mathml-units/) or UnitsML ( 
http://unitsml.nist.gov/), since both represent a concerted effort to 
develop a standard for representing units in a machine-processable 
format.I can think of several reasons why people might object 
vigourously to either solution: the current design is also already fixed 
in stone; it is harder to write my hand; it is harder for humans to 
read; it is much more verbose; and possibly quite simply that entry 
may have to have a flat list of sub-elements (as per the first sentence 
of this email).   However, these standards exist for a good reason, and  
we should have a good reason for not adopting them.


Cheers,
Robert



On 13/09/12 08:09, Schultz, Martin wrote:


Dear all,

 during the recent discussion on compound_name as additional tag 
in the standard_names.xml file and in relation to track ticket #90 it 
occurred to me that another useful addition could be to express the 
need of certain variable attributes in this table as well. This 
refers to the attempt of creating a CF-1.6-strict standard which 
would have more things mandatory in order to better support 
interoperable applications.


 One example (related to the new emission standard_names) could be:

-entry id= 
tendency_of_atmosphere_mass_content_of_alcohols_due_to_emission_from_industrial_processes_and_combustion


compound_nameAlcohols/compound_name

compound_codelisthttp://rdfdata.eionet.europa.eu/airquality/components/??/compound_codelist 
http://rdfdata.eionet.europa.eu/airquality/components/??%3c/compound_codelist 
# ??


canonical_unitskg m-2 s-1/canonical_units

description... /description

required_attributesNone/required_attributes

recommended_attributesemission_sector, emission_sector_reference, 
compound_group_members/recommended_attributes


optional_attributescomments/optional_attributes

/entry

The idea is that optional refers to could in the description, 
recommended to should, and required to shall.


Best regards,

Martin

PD Dr. Martin G. Schultz

IEK-8, Forschungszentrum Jülich

D-52425 Jülich

Ph: +49 2461 61 2831





Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt



Kennen Sie schon unsere app? http://www.fz-juelich.de/app


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
-

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu