Hello Jonathan, The reason that I used MAXT rather than TIME is that I am trying to follow the point data conventions with the possibility of multiple time series (along the INSTANCE dimension) of different lengths stored as padded rather than ragged arrays in a single file. Your example is restricted to a single time series, which might be a better idea for the example in the CF documentation as it has less confusing distractions.
I'm afraid that biology is a less precise domain than physics. However. compared to what we had when I started with biological data management 20-odd years ago, resources like WoRMS and ITIS are a massive step forward. I've been searching for something conforming to your expectations for 20 years, but have come to the conclusion it's an impossible dream as it'sorthogonal to the bioscience paradigm! What WoRMS/ITIS have delivered are unique and reliable identifiers for taxa, but these are not self-describing - the TSN and aphiaID are in fact integers. Using these IDs both circumvents the homonym issue (which can be infuriating: I have had many battles with a marine coral misidentified as a South American centipede because they both have the same species name) and provides a defence against the habit biologist have of changing the taxon names for a given entity over time. They have also done a lot to standardise the spelling of taxon names, particularly issues such as discrepencies in Latin word endings (e.g. forestii versus foresti). I cannot see any alternative to imcluding taxon_names and taxon_identifiers in parallel and I'm relieved that you are reasonably comfortable with the idea. Cheers, Roy. ________________________________________ From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk] Sent: 02 April 2013 17:38 To: cf-metadata@cgd.ucar.edu Subject: [CF-metadata] Taxa in CF. Some questions Dear Roy Yes, I think you are right that it is useful to have the taxon as a dimension because it allows you to put several of them in one variable, provided it's the same quantity, with the same generic standard name. That is just like bundling up timeseries from different locations into one data variable. This kind of dimension is called a "discrete axis" in CF 1.6, section 4.5. By "container variable" CF so far means something different: that's an empty data variable which exists to hang attributes from, to specify grid_mappings. I assume that MAXT is the size of the time dimension, isn't it? Could we write your example like this: dimensions; time=1000; string80=80; taxon=2; variables: float abundance(time,taxon); abundance:standard_name="number_concentration_of_taxon_in_sea_water"; abundance:coordinates="taxon_identifier taxon_name"; char taxon_name(taxon,string80); taxon_name:standard_name="taxon_name"; char taxon_identifier(taxon,string80); taxon_name:standard_name="taxon_identifier"; I am not sure if I've understood your example, though. Yes, I think both the taxon descriptions should be string-valued auxiliary coordinate variables, as I have shown them (CF section 6.1). If there is only one taxon, the taxon dimension could be omitted. However, I am a bit disturbed to learn that the taxon_name might not be reliable or unique. If CF is going to depend on an external vocabulary, I would argue that it needs one which provides unique and reliable self-describing identifiers. Best wishes Jonathan _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata