Thanks Jonathan, I was indeed responsible for introducing 'green dogs' to discussions in CF, but since then my experience has expanded further into biological data and, in particular, into the world of contaminants in biota through EMODNET and our work in BODC with the Sea Mammal Research Unit. This has shown what you say about invalid combination possibilities for taxa being much less of an issue to be exactly right. It has also shown me that protection against 'green dogs' can in some circumstances become an unaffordable luxury.
There are couple of points in your message where I would do things slightly differently. First, I would prefer 'number_concentration_of_taxon_in_sea_water' to 'number_concentration_of_biological_species_in_sea_water', because not all biological data are identified to the species level. Often the counts are at the level of genus, class or even phylum. Secondly, I think that CF setting up a controlled vocabulary for taxa is an unnecessary duplication that will cause us a lot of unnecessary work and take us out of our domain expertise comfort zone. In the marine domain, there is an almost universally accepted taxonomic controlled vocabulary with lashings of accompanying metadata that is extremely well governed by internationally recognised experts in the field with high quality technical governance in the form of tools, including a web service API. This is the World Register of Marine Species (WoRMS). I fully appreciate that CF covers more than the marine domain, but there is an alternative governance in the form of the International Taxonomic Information System (ITIS) , which is aimed more at terrestrial life than marine. If we say that names used in CF should be registered in at least one of these then we should be OK. As you will see in a message that has just been released, I'm proposing taking this forward through a Trac ticket. Cheers, Roy. ________________________________________ From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk] Sent: 25 March 2013 09:00 To: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] proposed standard names for Enterococcus and?Clostridium perfringens Dear all I agree with Philip that cfu should be spelled out. I was also going to make the same point about Roy's proposal being different from our treatment of chemical species, which are encoded in the standard name; this system seems to be working. One reason for keeping this approach was the "green dog" problem. That particular phrase is actually Roy's, if I remember correctly. That is, we wish to prevent nonsensical constructions, by approving each name which makes (chemical) sense individually. However Roy argues that there is an order of magnitude more biological species to deal with than chemical. I don't think that keeping the same approach (encoding in the standard name) would break the system, but it would make the standard name table very large. Perhaps more importantly, if there were so many species, I expect that data-writers would simply assume that each of the possible combinations of pattern and species did already exist in the standard name table, without bothering to check or have them approved. That would defeat the object of the system of individual approval. We don't have to follow the chemical approach. For named geographical regions and surface area types (vegetation types etc.) we use string-valued coordinate variables, rather like Roy proposes here. To follow that approach we would need a new table, subsidiary to the standard name table, containing a list of controlled names of biological species. We would use the same approval process to add names to this list as we do for the standard name table. (This is what we do for geographical regions and area types.) We would then have a standard_name such as number_concentration_of_biological_species_in_sea_water whose definition would note that a data variable with this standard_name must have a string-valued auxiliary coordinate variable of biological_species containing a valid name from the biological species table. If there is just one species, the auxiliary coordinate variable wouldn't need a dimension, but this construction would also allow a single data variable to contain data for several species, by having a dimension of size greater than one. Cheers Jonathan _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata