Re: [CF-metadata] CF grammar and online tool

Robert Muetzelfeldt Wed, 09 Mar 2011 05:32:48 -0800

Dear Martin,

Thanks for the positive and insightful feedback.



On 09/03/11 12:09, Schultz, Martin wrote:

Dear Robert,

     this is great! I would definitively support any proposal to try and follow 
this route in the future.

Thanks!

However, it will require some further discussion how to handle semantically incorrect 
names. As I understand it, the grammar can ensure that we arrive at syntactically correct 
names (which then have a fair chance of being physically meaningful), but all in all the 
matrix will remain sparse and we would need to find a way how to exclude useless 
combinations of grammar terms (to come back to the example from your Prolog grammar 
description: "bone eats dog" doesn't make sense).

This type of grammar is called a "semantic grammar", because the basicterms are meaningful in a particular subject area. This contrasts withmore familar grammars for, say, English, which consist of purelysyntactic terms, such as 'noun_phrase'. So there is a much betterchance that arbitrary combinations of words will be semantically valid,even though we are processing phrases in a purely syntactic way.Careful choice of base terms can help.

One of the advantages of Prolog over against other grammar systems isthat Prolog's grammar notation is a cosmetic layer on top of rawProlog. And Prolog is a language which is widely used for knowledgeprocessing etc. It is therefore straightforward to add (additional)semantic constraints into the grammar rules.

The "bone eats dog" example could be handled using either of the abovetwo approaches. Either have narrower categories (e.g. 'animal' ratherthan 'object'). Or have an "eats" grammar rule with additionalconstraints that the first term must be of type animal.


But yes, I agree: this requires further discussion.

     Related to the approval process two points: 1) I still think a standard_name list 
will be useful to maintain (at least for a while to come), simply because it can be 
relatively easily integrated in any kind of data analysis or checking tool. If you would 
have to interact with a web application each time before you want to make a plot of your 
data, you might be getting a lot of frustration over time. This doesn't mean that the 
list could not eventually be generated automatically, but there should still be some 
"approved list" which doesn't change too frequently so that people can keep 
track with downloading it.

I sort-of agree with you in the short term, though qualifying this withthe special case of 'modifiers'. If we accept that these can beincorporated into names, and that currently they do not need to gothrough the approval process, then that should continue.

For brevity, I did not mention in my original posting that theimplementation approach which Mark adopted gives you a direct API forinterrogating the web service. Therefore a human does not need to go tothe web site - a program can do this automatically. Alternatively, onecan install a free Prolog (I use SWI-Prolog), download the grammarrules, and run a trivial (10-line) Prolog program locally.

2) Perhaps one could redirect attention of the approval process to grammar elements 
rather than complete standard names? As the "sedimentation" discussion shows 
very nicely, adding a new term often merits a good discussion. On the other hand, if I 
copy a concept (i.e. use an existing standard name/grammar as template), such discussion 
may not be needed.

Yes, I think the logical development of a grammar-based approach is thatthe approval discussion will indeed relate to basic terms (and grammarrules), rather than complete Standard Names.

Here, I would indeed welcome the "timer" idea, so that new standard names would 
be accepted automatically if no one objects within a period of 1 month or so.

     If there was a web-based tool for testing new standard_names and perhaps even 
automatically "registering" them for approval, the email discussions on this 
list could be cut down to the more fundamental discussions and the discussion about those 
names that are not universally accepted.

It should be easy to add a "Submit for approval" button to the web page.

     Next: modifiers or not? Indeed, this question hinges on the approval 
process. If there is no need to approve the exact standard name, but only its 
elements, then the modifier could indeed become part of the standard_name 
(again: the individual modifiers should be agreed upon, but their (recursive) 
combination would be flexible).

Yes.

     Finally: concerning the provisional web tool. I tried to enter 
"mass_flux_of_nitrogen_oxide_in_air_due_to_emission_from_boreal_forest_fires" as a test 
case and received the answer "IS NOT" a valid standard name. OK: that's good to know, but 
is there a chance that the tool could also tell me which rule(s) are violated? That would be 
extremly helpful and probably key to success or failure of such a tool in the long run.

I agree that this is the area that needs a lot of work: e.g. displayingthe parse tree for names that are valid (Prolog supports this); andproviding some sort of guidance on names that fail. It's in generalhard to say why something failed (or, rather, there may be many ways inwhich it failed), but we can still try. One can also imagine avariety of tools which help you to formulate valid names: e.g. a toolwhich tells you what are words (or grammatical categories) can legallyfollow what you have entered so far.


Thanks again,
Robert

Best regards,

Martin

= Dr. Martin G. Schultz, IEK-8, Forschungszentrum Jülich  =
= D-52425 Jülich, Germany                                 =
= ph: +49 (0)2461 61 2831, fax: +49 (0)2461 61 8131       =
= email: m.schu...@fz-juelich.de                          =
= web: http://www.fz-juelich.de/icg/icg-2/m_schultz       =


-- referes to:

Date: Tue, 08 Mar 2011 13:14:09 +0000
From: Robert Muetzelfeldt<r.muetzelfe...@ed.ac.uk>
Subject: Re: [CF-metadata] standard_name modifiers
To: cf-metadata@cgd.ucar.edu
Message-ID:<4d762ba1.4040...@ed.ac.uk>
Content-Type: text/plain; charset=us-ascii; format=flowed

Dear all,

Jonathan suggested having a web-based tool which can be used to check possible 
standard names, prior to
submitting them for human approval.
This could use the grammar he developed for CF-metadata names, and which he has 
written up at
http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/   [...]

I thought it might help the discussion to implement this idea. This involved 
two steps:
1. Converting his grammar (as presented on Jonathan's web page) into Prolog's 
grammar notation.
2. Making a parser for this grammar available on the web.

The implementation of Jonathan's grammar in Prolog follows the approach which I 
have described previously
on this mailing list, and which is written up at
http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version)
- the only difference being that I have now used his grammar rules rather than 
ones based on the
CF-metadata guidelines.
[...]

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] CF grammar and online tool

Reply via email to