Thanks Jonathan,

I was indeed responsible for introducing 'green dogs' to discussions in CF, but 
since then my experience has expanded further into biological data and, in 
particular, into the world of contaminants in biota through EMODNET and our 
work in BODC with the Sea Mammal Research Unit.  This has shown what you say 
about invalid combination possibilities for taxa being much less of an issue to 
be exactly right.  It has also shown me that protection against 'green dogs' 
can in some circumstances become an unaffordable luxury.

There are couple of points in your message where I would do things slightly 
differently.

First, I would prefer 'number_concentration_of_taxon_in_sea_water' to 
'number_concentration_of_biological_species_in_sea_water', because not all 
biological data are identified to the species level.  Often the counts are at 
the level of genus, class or even phylum.

Secondly, I think that CF setting up a controlled vocabulary for taxa is an 
unnecessary duplication that will cause us a lot of unnecessary work and take 
us out of our domain expertise comfort zone.  In the marine domain, there is an 
almost universally accepted taxonomic controlled vocabulary with lashings of 
accompanying metadata that is extremely well governed by internationally 
recognised experts in the field with high quality technical governance in the 
form of tools, including a web service API.  This is the World Register of 
Marine Species (WoRMS). I fully appreciate that CF covers more than the marine 
domain, but there is an alternative governance in the form of the International 
Taxonomic Information System (ITIS) , which is aimed more at terrestrial life 
than marine. If we say that names used in CF should be registered in at least 
one of these then we should be OK.

As you will see in a message that has just been released, I'm proposing taking 
this forward through a Trac ticket.

Cheers, Roy.



________________________________________
From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan 
Gregory [j.m.greg...@reading.ac.uk]
Sent: 25 March 2013 09:00
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] proposed standard names for Enterococcus 
and?Clostridium perfringens

Dear all

I agree with Philip that cfu should be spelled out. I was also going to make
the same point about Roy's proposal being different from our treatment of
chemical species, which are encoded in the standard name; this system seems to
be working. One reason for keeping this approach was the "green dog" problem.
That particular phrase is actually Roy's, if I remember correctly. That is, we
wish to prevent nonsensical constructions, by approving each name which makes
(chemical) sense individually.

However Roy argues that there is an order of magnitude more biological species
to deal with than chemical. I don't think that keeping the same approach
(encoding in the standard name) would break the system, but it would make the
standard name table very large. Perhaps more importantly, if there were so
many species, I expect that data-writers would simply assume that each of the
possible combinations of pattern and species did already exist in the standard
name table, without bothering to check or have them approved. That would defeat
the object of the system of individual approval.

We don't have to follow the chemical approach. For named geographical
regions and surface area types (vegetation types etc.) we use string-valued
coordinate variables, rather like Roy proposes here. To follow that approach
we would need a new table, subsidiary to the standard name table, containing
a list of controlled names of biological species. We would use the same
approval process to add names to this list as we do for the standard name
table. (This is what we do for geographical regions and area types.) We would
then have a standard_name such as
  number_concentration_of_biological_species_in_sea_water
whose definition would note that a data variable with this standard_name must
have a string-valued auxiliary coordinate variable of biological_species
containing a valid name from the biological species table. If there is just
one species, the auxiliary coordinate variable wouldn't need a dimension,
but this construction would also allow a single data variable to contain data
for several species, by having a dimension of size greater than one.

Cheers

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

This message (and any attachments) is for the recipient only. NERC is subject 
to the Freedom of Information Act 2000 and the contents of this email and any 
reply you make may be disclosed by NERC unless it is exempt from release under 
the Act. Any material supplied to NERC may be stored in an electronic records 
management system.
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to