Re: [CF-metadata] standard_name modifiers

Robert Muetzelfeldt Tue, 08 Mar 2011 05:14:27 -0800

Dear all,

Jonathan suggested having a web-based tool which can be used to checkpossible standard names, prior to submitting them for human approval.This could use the grammar he developed for CF-metadata names, and whichhe has written up athttp://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/<http://www.met.reading.ac.uk/%7Ejonathan/CF_metadata/14.1/>. (Thisgrammar apparently handled all the Standard Names around when it wasdeveloped - a very impressive achievement.)

I thought it might help the discussion to implement this idea. Thisinvolved two steps:1. Converting his grammar (as presented on Jonathan's web page) intoProlog's grammar notation.

2. Making a parser for this grammar available on the web.

The implementation of Jonathan's grammar in Prolog follows the approachwhich I have described previously on this mailing list, and which iswritten up athttp://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version)<http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+%28Prolog+version%29>- the only difference being that I have now used his grammar rulesrather than ones based on the CF-metadata guidelines.

I have checked the list of standard names which Jonathan used in hiswork against the Prolog version of his grammar, and currently 1958 outof the total of 2072 names parse correctly. The remaining 100 or so areprobably down to slight differences in the way that filler words, suchas prepositions, are handled - this is just a question of checking allthe rules for consistency in how the filler words are included.

My colleague Mark Muetzelfeldt has produced a web app which gives accessto this Prolog grammar, providing browser-based access to a simple querysystem for checking that a proposed standard name conforms to thegrammar. This is available athttp://www.eco-epistemics.org/cf_metadata_grammar/. It includes the textfor the complete Prolog version of Jonathan's grammar. It goes withoutsaying that I would welcome any feedback (or, better, that we decide toset up some sort of working group to take this forward). Please notethat this is a highly-experimental and early-stage exercise, designedprimarily to explore what a grammar-based online CF-metadata checkermight look like. It has been tested only in Firefox and Chrome.


A number of issues have arisen during this exercise:

1. I feel strongly that it is highly desirable to use a standard grammarnotation (such as Prolog's) for representing the grammar. Apart from thebenefits of using a standard approach, this makes it straightforward tohandle arbitrary nesting of grammar rules (as in, say, a grammar forEnglish), rather than Jonathan's flat set of rules.

2. In my opinion, it is far better for the name itself to contain allthe information about a particular variable, rather than use a separatemechanism (modifiers). Consider a variable such as"monthly_mean_of_log_of ratio_of_leaf_carbon_to_root_nitrogen". This isstraightforward to capture in a grammar (provided it can handle therecursive aspect of the nesting of mathematical functions, which mostcould), and almost impossible to capture by the use of modifiers on somebase Standard Name.

3. Prolog is a particularly useful platform to use for this task. It haslong had a specific notation for grammar rules which is very readableand supported natively by the Prolog interpreter. Using Prolog offersseveral substantial benefits over other approaches, including theability to handle more advanced grammar requirements, the ability toquery the grammar and/or a collection of Standard Names directly in theProlog interpreter, and the ease with which it can made available as aweb app.

4. One feature of Prolog which deserves special mention is that it caneasily be used to automatically generate names which are valid accordingto the grammar - this can be done with a one-line query. This may seemuseless, but in fact is a very effective way of picking up weaknesses inthe grammar: if a generated name is (to the expert human) nonsense, thenthat can help us to refine the grammar.

5. Jonathan's grammar includes base (atomic) terms which could befurther broken down, for example:

phenomenon -->
due_to_condensation_and_evaporation_from_boundary_layer_mixing
due_to_condensation_and_evaporation_from_convection
due_to_condensation_and_evaporation_from_longwave_heating
due_to_condensation_and_evaporation_from_pressure_change
due_to_condensation_and_evaporation_from_shortwave_heating
due_to_condensation_and_evaporation_from_turbulence
There is clearly scope here for some more rationalisation.

6. The current policy is (as I understand it) that each new StandardName has to be approved individually, but that anyone can add whatevermodifiers they like. If, as I suggest, the role of modifiers isincorporated into the grammar for Standard Names, then this raisesissues about the approval process. I suspect it would be possible forthe parser to detect which names require manual approval and which donot, according to which rules are fired, but this would need furtherresearch.

7. Ultimately I believe we need to move away from an approved list ofnames to an approved grammar for formulating names. However, as Jonathanhas stressed, a grammar is useful - even when a manual approval processis used - for checking that names conform to agreed style rules.


Cheers,
Robert


On 03/03/11 18:02, Jonathan Gregory wrote:

Dear Philip and John

I agree with what Philip says here:

We could then tweak our current practice on this mailing list so that when a
person proposes a std_name they should state (or perhaps there is a little
bit of code to check) that the proposed std_name conforms to the existing
grammar and vocabulary rules.  I think most of us would then provide only
cursory scrutiny.  Perhaps there could even be an automatic timer so that if
nobody objects within some time period (perhaps 1 month) then the name is
automatically accepted.  Essentially the default decision for conforming
names would be 'acceptance'.  I think this would also make the generation of
the text descriptions either automatic, or perhaps obsolete, in many cases
because they could be inferred from the grammar and vocabulary tables.

I could bring the grammar up to date as a starting point. I agree that it
would be possible to work out text corresponding to each phrase and thus
construct definitions, or at least a first draft of them. Units could also
be deduced automatically. I don't myself have the expertise or the time to
write scripts in support of this, to make it easy for proposers to use these
procedures e.g. on the web.


John writes:

But where we are talking about adding generic modifiers, it seems to me a more 
automated approach is possible.  If the meaning of the modifier is clear, then 
no matter what name it is applied to, the meaning of the resulting compound 
should be clear.  If that is the case, then adding that modifier to an existing 
name should be verifiable mechanically.

If this refers to the standard_name modifiers, which are separate words
appended to standard names, then in fact no approval is needed. It is fine
to add these to the standard_name attribute. That is not regarded as creating
a new standard_name. In fact the modifiers were introduced to avoid having to
add such names to the table.

Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] standard_name modifiers

Reply via email to