Dear all,

Jonathan suggested having a web-based tool which can be used to check possible standard names, prior to submitting them for human approval. This could use the grammar he developed for CF-metadata names, and which he has written up at http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/ <http://www.met.reading.ac.uk/%7Ejonathan/CF_metadata/14.1/>. (This grammar apparently handled all the Standard Names around when it was developed - a very impressive achievement.)

I thought it might help the discussion to implement this idea. This involved two steps: 1. Converting his grammar (as presented on Jonathan's web page) into Prolog's grammar notation.
2. Making a parser for this grammar available on the web.

The implementation of Jonathan's grammar in Prolog follows the approach which I have described previously on this mailing list, and which is written up at http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version) <http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+%28Prolog+version%29> - the only difference being that I have now used his grammar rules rather than ones based on the CF-metadata guidelines.

I have checked the list of standard names which Jonathan used in his work against the Prolog version of his grammar, and currently 1958 out of the total of 2072 names parse correctly. The remaining 100 or so are probably down to slight differences in the way that filler words, such as prepositions, are handled - this is just a question of checking all the rules for consistency in how the filler words are included.

My colleague Mark Muetzelfeldt has produced a web app which gives access to this Prolog grammar, providing browser-based access to a simple query system for checking that a proposed standard name conforms to the grammar. This is available at http://www.eco-epistemics.org/cf_metadata_grammar/. It includes the text for the complete Prolog version of Jonathan's grammar. It goes without saying that I would welcome any feedback (or, better, that we decide to set up some sort of working group to take this forward). Please note that this is a highly-experimental and early-stage exercise, designed primarily to explore what a grammar-based online CF-metadata checker might look like. It has been tested only in Firefox and Chrome.

A number of issues have arisen during this exercise:

1. I feel strongly that it is highly desirable to use a standard grammar notation (such as Prolog's) for representing the grammar. Apart from the benefits of using a standard approach, this makes it straightforward to handle arbitrary nesting of grammar rules (as in, say, a grammar for English), rather than Jonathan's flat set of rules.

2. In my opinion, it is far better for the name itself to contain all the information about a particular variable, rather than use a separate mechanism (modifiers). Consider a variable such as "monthly_mean_of_log_of ratio_of_leaf_carbon_to_root_nitrogen". This is straightforward to capture in a grammar (provided it can handle the recursive aspect of the nesting of mathematical functions, which most could), and almost impossible to capture by the use of modifiers on some base Standard Name.

3. Prolog is a particularly useful platform to use for this task. It has long had a specific notation for grammar rules which is very readable and supported natively by the Prolog interpreter. Using Prolog offers several substantial benefits over other approaches, including the ability to handle more advanced grammar requirements, the ability to query the grammar and/or a collection of Standard Names directly in the Prolog interpreter, and the ease with which it can made available as a web app.

4. One feature of Prolog which deserves special mention is that it can easily be used to automatically generate names which are valid according to the grammar - this can be done with a one-line query. This may seem useless, but in fact is a very effective way of picking up weaknesses in the grammar: if a generated name is (to the expert human) nonsense, then that can help us to refine the grammar.

5. Jonathan's grammar includes base (atomic) terms which could be further broken down, for example:
phenomenon -->
due_to_condensation_and_evaporation_from_boundary_layer_mixing
due_to_condensation_and_evaporation_from_convection
due_to_condensation_and_evaporation_from_longwave_heating
due_to_condensation_and_evaporation_from_pressure_change
due_to_condensation_and_evaporation_from_shortwave_heating
due_to_condensation_and_evaporation_from_turbulence
There is clearly scope here for some more rationalisation.

6. The current policy is (as I understand it) that each new Standard Name has to be approved individually, but that anyone can add whatever modifiers they like. If, as I suggest, the role of modifiers is incorporated into the grammar for Standard Names, then this raises issues about the approval process. I suspect it would be possible for the parser to detect which names require manual approval and which do not, according to which rules are fired, but this would need further research.

7. Ultimately I believe we need to move away from an approved list of names to an approved grammar for formulating names. However, as Jonathan has stressed, a grammar is useful - even when a manual approval process is used - for checking that names conform to agreed style rules.

Cheers,
Robert


On 03/03/11 18:02, Jonathan Gregory wrote:
Dear Philip and John

I agree with what Philip says here:

We could then tweak our current practice on this mailing list so that when a
person proposes a std_name they should state (or perhaps there is a little
bit of code to check) that the proposed std_name conforms to the existing
grammar and vocabulary rules.  I think most of us would then provide only
cursory scrutiny.  Perhaps there could even be an automatic timer so that if
nobody objects within some time period (perhaps 1 month) then the name is
automatically accepted.  Essentially the default decision for conforming
names would be 'acceptance'.  I think this would also make the generation of
the text descriptions either automatic, or perhaps obsolete, in many cases
because they could be inferred from the grammar and vocabulary tables.
I could bring the grammar up to date as a starting point. I agree that it
would be possible to work out text corresponding to each phrase and thus
construct definitions, or at least a first draft of them. Units could also
be deduced automatically. I don't myself have the expertise or the time to
write scripts in support of this, to make it easy for proposers to use these
procedures e.g. on the web.


John writes:
But where we are talking about adding generic modifiers, it seems to me a more 
automated approach is possible.  If the meaning of the modifier is clear, then 
no matter what name it is applied to, the meaning of the resulting compound 
should be clear.  If that is the case, then adding that modifier to an existing 
name should be verifiable mechanically.
If this refers to the standard_name modifiers, which are separate words
appended to standard names, then in fact no approval is needed. It is fine
to add these to the standard_name attribute. That is not regarded as creating
a new standard_name. In fact the modifiers were introduced to avoid having to
add such names to the table.

Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to